In this version these will be performed:
Top features selection based on trained models’ feature importance.
This will depend on different number of CpGs selected and different features selection methods.
The features selection methods mainly have two different purpose, one is for binary classification, another is multi-class classification.
Top features selection based on trained models’ feature importance with different selection methods.
There will have several selection methods, for example based on mean feature importance, median quantile feature importance and frequency / common feature importance.
Output two data frames that will be used in Pareto optimal.
One is filtered data frame with Top Number of features based on different method selection.
The another one is the phenotype data frame.
The section of evaluation for the output selected feature performance based on three methods are performed.
This part is collection of input , change them as needed.
csv_Ni1905FilePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\ADNI_covariate_withEpiage_1905obs.csv"
TopSelectedCpGs_filePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\Top5K_CpGs.csv"
# Number of Top CpGs keeped based on standard deviation
Number_N_TopNCpGs<-params$INPUT_Number_N_TopNCpGs
# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"
Impute_NA_FLAG_NUM = 1
# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
# if we want to use classification with CN vs AD, then let "METHOD_FEATURE_FLAG_NUM=4"
# if we want to use classification with CN vs MCI, then let "METHOD_FEATURE_FLAG_NUM=5"
# if we want to use classification with MCI vs AD, then let "METHOD_FEATURE_FLAG_NUM=6"
METHOD_FEATURE_FLAG_NUM = 3
# GOTO "INPUT" Session to set the Number of common features needed
# Generally this is for visualization
NUM_COMMON_FEATURES_SET = 20
NUM_COMMON_FEATURES_SET_Frequency = 20
The feature selection method :
# This is the flag of phenotype data output,
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".
phenoOutPUt_FLAG = TRUE
# For 8.0 Feature Selection and Output :
# NUM_FEATURES <- INPUT_NUMBER_FEATURES
# This is number of features needed
# Method_Selected_Choose <- INPUT_Method_Selected_Choose
# This is the method performed for the Output stage feature selection method
INPUT_NUMBER_FEATURES = params$INPUT_OUT_NUMBER_FEATURES
INPUT_Method_Mean_Choose = TRUE
INPUT_Method_Median_Choose = TRUE
INPUT_Method_Frequency_Choose = TRUE
if(INPUT_Method_Mean_Choose|| INPUT_Method_Median_Choose || INPUT_Method_Frequency_Choose){
OUTUT_file_directory<- "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_SelectedFeatures\\"
OUTUT_CSV_PATHNAME <- paste(OUTUT_file_directory,"INPUT_",Number_N_TopNCpGs,"CpGs\\",sep="")
if (dir.exists(OUTUT_CSV_PATHNAME)) {
message("Directory already exists.")
} else {
dir.create(OUTUT_CSV_PATHNAME, recursive = TRUE)
message("Directory created.")
}
}
## Directory already exists.
FLAG_WRITE_METRICS_DF is flag of whether to output the csv which contains the performance metrics.
# This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics
Metrics_Table_Output_FLAG = TRUE
FLAG_WRITE_METRICS_DF = TRUE
if(FLAG_WRITE_METRICS_DF){
OUTUT_PerfMertics_directory<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_PerformanceMetrics\\"
OUTUT_PerformanceMetricsCSV_PATHNAME <- paste(OUTUT_PerfMertics_directory,"INPUT_",Number_N_TopNCpGs,"CpGs_",INPUT_NUMBER_FEATURES,"SelFeature_PerMetrics.csv",sep="")
if (dir.exists(OUTUT_PerfMertics_directory)) {
message("Directory already exists.")
} else {
dir.create(OUTUT_PerfMertics_directory, recursive = TRUE)
message("Directory created.")
}
print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## Directory already exists.
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"
Packages and Libraries that may need to install and use.
# Function to check and install Bioconductor package: "limma"
install_bioc_packages <- function(packages) {
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
BiocManager::install(pkg, dependencies = TRUE)
} else {
message(paste("Package", pkg, "is already installed."))
}
}
}
install_bioc_packages("limma")
## Package limma is already installed.
print("The required packages are all successfully installed.")
## [1] "The required packages are all successfully installed."
library(limma)
Set seed for reproduction.
set.seed(123)
csv_NI1905<-read.csv(csv_Ni1905FilePath)
csv_NI1905_RAW <- csv_NI1905
TopSelectedCpGs<-read.csv(TopSelectedCpGs_filePath, check.names = FALSE)
TopSelectedCpGs_RAW <- TopSelectedCpGs
head(csv_NI1905,n=3)
rownames(csv_NI1905)<-as.matrix(csv_NI1905[,"barcodes"])
dim(csv_NI1905)
## [1] 1905 23
dim(TopSelectedCpGs)
## [1] 5000 1921
head(TopSelectedCpGs[,1:8])
rownames(TopSelectedCpGs)<-TopSelectedCpGs[,1]
head(rownames(TopSelectedCpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopSelectedCpGs))
## [1] "ProbeID" "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopSelectedCpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"
This part is used to adjust the CpGs needed to use, it will keep the top N CpGs based on standard deviation.
sorted_TopSelectedCpGs <- TopSelectedCpGs[order(-TopSelectedCpGs$sdDev), ]
TopN_CpGs <- head(sorted_TopSelectedCpGs,Number_N_TopNCpGs )
TopN_CpGs_RAW<-TopN_CpGs
Variable “TopN_CpGs” will be used for processing the data. Now let’s take a look at it.
dim(TopN_CpGs)
## [1] 5000 1921
rownames(TopN_CpGs)<-TopN_CpGs[,1]
head(rownames(TopN_CpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopN_CpGs))
## [1] "ProbeID" "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopN_CpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"
Now, let’s check with duplicate of Sample ID (“barcodes”):
Start with people who don’t have the unique ID (“uniqueID = 0”):
library(dplyr)
dim(csv_NI1905[csv_NI1905$uniqueID == 0, ])
## [1] 1256 23
dim(csv_NI1905[csv_NI1905$uniqueID == 1, ])
## [1] 649 23
duplicates <- csv_NI1905[csv_NI1905$uniqueID == 0, ] %>%
group_by(barcodes) %>%
filter(n() > 1) %>%
ungroup()
print(dim(duplicates))
## [1] 0 23
rm(duplicates)
Based on the output of dimension , they have the different Sample ID (“barcodes”).
Then check with all records, whether they have duplicated Sample ID (“barcodes”).
duplicates <- csv_NI1905 %>%
group_by(barcodes) %>%
filter(n() > 1) %>%
ungroup()
print(dim(duplicates))
## [1] 0 23
From the above output, we can see the Sample ID (“barcodes”) are unique.
names(csv_NI1905)
## [1] "barcodes" "RID.a" "prop.B" "prop.NK" "prop.CD4T" "prop.CD8T" "prop.Mono" "prop.Neutro" "prop.Eosino" "DX" "age.now" "PTGENDER" "ABETA" "TAU"
## [15] "PTAU" "PC1" "PC2" "PC3" "ageGroup" "ageGroupsq" "DX_num" "uniqueID" "Horvath"
There might have the situation that the same person with different timeline. So we only keep the data with who has the unique ID, “unique ID =1”
csv_NI1905<-csv_NI1905[csv_NI1905$uniqueID == 1, ]
dim(csv_NI1905)
## [1] 649 23
Since “DX” will be response variable, we first remove all rows with NA value in “DX” column
# "DX" will be Y,remove all rows with NA value in "DX" column
csv_NI1905<-csv_NI1905 %>% filter(!is.na(DX))
We only keep with the samples which appears in both datasets.
Matrix_sample_names_NI1905 <- as.matrix(csv_NI1905[,"barcodes"])
Matrix_sample_names_TopN_CpGs <- as.matrix(colnames(TopN_CpGs))
common_sample_names<-intersect(Matrix_sample_names_NI1905,Matrix_sample_names_TopN_CpGs)
csv_NI1905 <- csv_NI1905 %>% filter(barcodes %in% common_sample_names)
TopN_CpGs <- TopN_CpGs[, common_sample_names, drop = FALSE]
head(TopN_CpGs[,1:3],n=2)
dim(TopN_CpGs)
## [1] 5000 648
dim(csv_NI1905)
## [1] 648 23
Merge these two datasets and tored into “merged_df”
trans_TopN_CpGs<-t(TopN_CpGs)
# Check the total length of the rownames
# Recall that the sample name have been matched and both of them don't have duplicates
# Now, order the rownames and bind them together. This can make sure that the merged data frame created by these two data frame correctly matched together.
trans_TopN_CpGs_ordered<-trans_TopN_CpGs[order(rownames(trans_TopN_CpGs)),]
csv_NI1905_ordered<-csv_NI1905[order(rownames(csv_NI1905)),]
print("The rownames matchs in order:")
## [1] "The rownames matchs in order:"
check_1 = length(rownames(csv_NI1905_ordered))
check_2 = sum(rownames(csv_NI1905_ordered)==rownames(trans_TopN_CpGs_ordered))
print(check_1==check_2)
## [1] TRUE
merged_df_raw<-cbind(trans_TopN_CpGs_ordered,csv_NI1905_ordered)
phenotic_features_RAW<-colnames(csv_NI1905)
print(phenotic_features_RAW)
## [1] "barcodes" "RID.a" "prop.B" "prop.NK" "prop.CD4T" "prop.CD8T" "prop.Mono" "prop.Neutro" "prop.Eosino" "DX" "age.now" "PTGENDER" "ABETA" "TAU"
## [15] "PTAU" "PC1" "PC2" "PC3" "ageGroup" "ageGroupsq" "DX_num" "uniqueID" "Horvath"
phenoticPart_RAW <- merged_df_raw[,phenotic_features_RAW]
dim(phenoticPart_RAW)
## [1] 648 23
head(phenoticPart_RAW)
head(merged_df_raw[,1:3])
merged_df<-merged_df_raw
head(colnames(merged_df))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
The name of feature CpGs could be called by: “featureName_CpGs”
featureName_CpGs<-rownames(TopN_CpGs)
length(featureName_CpGs)
## [1] 5000
head(featureName_CpGs)
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
clean_merged_df<-merged_df
missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## ABETA TAU PTAU
## 109 109 109
Choose the method we want the data apply. The output dataset name is “clean_merged_df”.
# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"
Impute_NA_FLAG = Impute_NA_FLAG_NUM
if (Impute_NA_FLAG == 1){
clean_merged_df_imputed_mean<-clean_merged_df
mean_ABETA_rmNA <- mean(clean_merged_df$ABETA, na.rm = TRUE)
clean_merged_df_imputed_mean$ABETA[
is.na(clean_merged_df_imputed_mean$ABETA)] <- mean_ABETA_rmNA
mean_TAU_rmNA <- mean(clean_merged_df$TAU, na.rm = TRUE)
clean_merged_df_imputed_mean$TAU[
is.na(clean_merged_df_imputed_mean$TAU)] <- mean_TAU_rmNA
mean_PTAU_rmNA <- mean(clean_merged_df$PTAU, na.rm = TRUE)
clean_merged_df_imputed_mean$PTAU[
is.na(clean_merged_df_imputed_mean$PTAU)] <- mean_PTAU_rmNA
clean_merged_df = clean_merged_df_imputed_mean
}
library(VIM)
if (Impute_NA_FLAG == 2){
df_imputed_KNN <- kNN(merged_df, k = 5)
imputed_summary <- colSums(df_imputed_KNN[, grep("_imp", names(df_imputed_KNN))])
print(imputed_summary[imputed_summary > 0])
clean_merged_df<-df_imputed_KNN[, -grep("_imp", names(df_imputed_KNN))]
}
missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## named numeric(0)
Choose the method we want to use
# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
METHOD_FEATURE_FLAG = METHOD_FEATURE_FLAG_NUM
if (METHOD_FEATURE_FLAG == 1){
df_fs_method1 <- clean_merged_df
}
if(METHOD_FEATURE_FLAG == 1){
phenotic_features_m1<-c("DX","age.now","PTGENDER",
"PC1","PC2","PC3")
pickedFeatureName_m1<-c(phenotic_features_m1,featureName_CpGs)
df_fs_method1<-clean_merged_df[,pickedFeatureName_m1]
df_fs_method1$DX<-as.factor(df_fs_method1$DX)
df_fs_method1$PTGENDER<-as.factor(df_fs_method1$PTGENDER)
head(df_fs_method1[,1:5],n=3)
dim(df_fs_method1)
}
if(METHOD_FEATURE_FLAG == 1){
dim(df_fs_method1)
}
Create contrast matrix for comparing CN vs Dementia vs MCI
if(METHOD_FEATURE_FLAG == 1){
pheno_data_m1 <- df_fs_method1[,phenotic_features_m1]
head(pheno_data_m1[,1:5],n=3)
pheno_data_m1$DX <- factor(pheno_data_m1$DX, levels = c("CN", "MCI", "Dementia"))
design_m1 <- model.matrix(~ 0 + DX + age.now + PTGENDER + PC1 + PC2 + PC3,
data = pheno_data_m1)
colnames(design_m1)[colnames(design_m1) == "DXCN"] <- "CN"
colnames(design_m1)[colnames(design_m1) == "DXDementia"] <- "Dementia"
colnames(design_m1)[colnames(design_m1) == "DXMCI"] <- "MCI"
head(design_m1)
cpg_matrix_m1 <- t(as.matrix(df_fs_method1[, featureName_CpGs]))
fit_m1 <- lmFit(cpg_matrix_m1, design_m1)
}
if(METHOD_FEATURE_FLAG == 1){
# for here, we have three labels. The contrasts to compare groups will be:
contrast_matrix_m1 <- makeContrasts(
MCI_vs_CN = MCI - CN,
Dementia_vs_CN = Dementia - CN,
Dementia_vs_MCI = Dementia - MCI,
levels = design_m1
)
fit2_m1 <- contrasts.fit(fit_m1, contrast_matrix_m1)
fit2_m1 <- eBayes(fit2_m1)
topTable(fit2_m1, coef = "MCI_vs_CN")
topTable(fit2_m1, coef = "Dementia_vs_CN")
topTable(fit2_m1, coef = "Dementia_vs_MCI")
summary_results_m1 <- decideTests(fit2_m1,method = "nestedF", adjust.method = "none", p.value = 0.05)
table(summary_results_m1)
}
if(METHOD_FEATURE_FLAG == 1){
significant_dmp_filter_m1 <- summary_results_m1 != 0
significant_cpgs_m1_DMP <- unique(rownames(summary_results_m1)[
apply(significant_dmp_filter_m1, 1, any)])
print(paste("The significant CpGs after DMP are:",
paste(significant_cpgs_m1_DMP, collapse = ", ")))
print(paste("Length of CpGs after DMP:",
length(significant_cpgs_m1_DMP)))
pickedFeatureName_m1_afterDMP<-c(phenotic_features_m1,significant_cpgs_m1_DMP)
df_fs_method1<-df_fs_method1[,pickedFeatureName_m1_afterDMP]
dim(df_fs_method1)
}
if(METHOD_FEATURE_FLAG == 1){
library(recipes)
df_picked <- df_fs_method1
rec <- recipe(DX ~ ., data = df_picked) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked)
processed_data_m1 <- bake(rec_prep, new_data = df_picked)
dim(processed_data_m1)
processed_data_m1_df<-as.data.frame(processed_data_m1)
rownames(processed_data_m1_df)<-rownames(df_picked)
}
if(METHOD_FEATURE_FLAG == 1){
AfterProcess_FeatureName_m1<-colnames(processed_data_m1)
head(AfterProcess_FeatureName_m1)
tail(AfterProcess_FeatureName_m1)
}
if(METHOD_FEATURE_FLAG == 1){
head(processed_data_m1[,1:5])
}
if(METHOD_FEATURE_FLAG == 1){
lastColumn_NUM<-dim(processed_data_m1)[2]
last5Column_NUM<-lastColumn_NUM-5
head(processed_data_m1[,last5Column_NUM :lastColumn_NUM])
}
if(METHOD_FEATURE_FLAG == 2){
bloodPropFeatureName<-c("RID.a","prop.B","prop.NK",
"prop.CD4T","prop.CD8T","prop.Mono",
"prop.Neutro","prop.Eosino")
pickedFeatureName_m2<-c("DX","age.now",
"PTGENDER",bloodPropFeatureName,
"ABETA","TAU","PTAU",featureName_CpGs)
df_fs_method2<-clean_merged_df[,pickedFeatureName_m2]
}
if(METHOD_FEATURE_FLAG == 2){
library(recipes)
rec <- recipe(DX ~ ., data = df_fs_method2) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_fs_method2)
processed_data_m2 <- bake(rec_prep, new_data = df_fs_method2)
dim(processed_data_m2)
}
if(METHOD_FEATURE_FLAG == 2){
X_df_m2<-subset(processed_data_m2,select = -DX)
Y_df_m2<-processed_data_m2$DX
pca_result <- prcomp(X_df_m2, center = TRUE, scale. = TRUE)
summary(pca_result)
screeplot(pca_result,type="lines")
}
if(METHOD_FEATURE_FLAG == 2){
PCA_component_threshold<-0.7
}
if(METHOD_FEATURE_FLAG == 2){
library(caret)
preproc<-preProcess(X_df_m2,method="pca",
thresh = PCA_component_threshold)
X_df_m2_transformed_PCA <- predict(preproc,X_df_m2)
data_processed_PCA<-data.frame(X_df_m2_transformed_PCA,Y_df_m2)
colnames(data_processed_PCA)[
which(colnames(data_processed_PCA)=="Y_df_m2")]<-"DX"
head(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 2){
processed_data_m2<-data_processed_PCA
AfterProcess_FeatureName_m2<-colnames(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 3){
df_fs_method3<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 3){
phenotic_features_m3<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m3<-c(phenotic_features_m3,featureName_CpGs)
df_picked_m3<-df_fs_method3[,pickedFeatureName_m3]
df_picked_m3$DX<-as.factor(df_picked_m3$DX)
df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
head(df_picked_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
dim(df_picked_m3)
}
## [1] 648 5006
if(METHOD_FEATURE_FLAG == 3){
df_picked_m3<-df_picked_m3 %>% mutate(
DX = ifelse(DX == "CN", "CN",ifelse(DX
%in% c("MCI","Dementia"),"CI",NA)))
df_picked_m3$DX<-as.factor(df_picked_m3$DX)
df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
head(df_picked_m3[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
pheno_data_m3 <- df_picked_m3[,phenotic_features_m3]
head(pheno_data_m3[,1:5],n=3)
design_m3 <- model.matrix(~0 + .,data=pheno_data_m3)
colnames(design_m3)[colnames(design_m3) == "DXCN"] <- "CN"
colnames(design_m3)[colnames(design_m3) == "DXCI"] <- "CI"
head(design_m3)
beta_values_m3 <- t(as.matrix(df_fs_method3[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 3, we focus on two groups, one contrast of interest.
if(METHOD_FEATURE_FLAG == 3){
fit_m3 <- lmFit(beta_values_m3, design_m3)
head(fit_m3$coefficients)
contrast.matrix <- makeContrasts(CI - CN, levels = design_m3)
fit2_m3 <- contrasts.fit(fit_m3, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m3 <- eBayes(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
decideTests(fit2_m3)
}
## TestResults matrix
## Contrasts
## CI - CN
## cg08223187 0
## cg15794987 0
## cg04821830 0
## cg24629711 0
## cg17380855 0
## 4995 more rows ...
if(METHOD_FEATURE_FLAG == 3){
dmp_results_m3_try1 <- decideTests(
fit2_m3, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m3_try1)
}
## dmp_results_m3_try1
## 0
## 5000
if(METHOD_FEATURE_FLAG == 3){
# Identify DMPs, we will use this one:
dmp_results_m3 <- decideTests(
fit2_m3, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m3)
}
## dmp_results_m3
## -1 0 1
## 200 4619 181
if(METHOD_FEATURE_FLAG == 3){
significant_dmp_filter <- dmp_results_m3 != 0
significant_cpgs_m3_DMP <- rownames(dmp_results_m3)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m3_afterDMP<-c(phenotic_features_m3,significant_cpgs_m3_DMP)
df_picked_m3<-df_picked_m3[,pickedFeatureName_m3_afterDMP]
dim(df_picked_m3)
}
## [1] 648 387
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 3){
full_results_m3 <- topTable(fit2_m3, number=Inf)
full_results_m3 <- tibble::rownames_to_column(full_results_m3,"ID")
head(full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
sorted_full_results_m3 <- full_results_m3[
order(full_results_m3$logFC, decreasing = TRUE), ]
head(sorted_full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
library(ggplot2)
ggplot(full_results_m3,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 3){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m3 <- full_results_m3 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m3, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 3){
ggplot(full_results_m3,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 3){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m3 <- full_results_m3 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m3,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider increasing max.overlaps
if(METHOD_FEATURE_FLAG == 3){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m3) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m3)
processed_data_m3 <- bake(rec_prep, new_data = df_picked_m3)
processed_data_m3_df <- as.data.frame(processed_data_m3)
rownames(processed_data_m3_df) <- rownames(df_picked_m3)
dim(processed_data_m3)
}
## [1] 648 314
if(METHOD_FEATURE_FLAG == 3){
AfterProcess_FeatureName_m3<-colnames(processed_data_m3)
head(AfterProcess_FeatureName_m3)
tail(AfterProcess_FeatureName_m3)
}
## [1] "cg21243064" "cg27577781" "cg20685672" "cg03660162" "cg17042243" "DX"
if(METHOD_FEATURE_FLAG == 3){
levels(df_picked_m3$DX)
}
## [1] "CI" "CN"
if(METHOD_FEATURE_FLAG == 3){
head(processed_data_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
lastColumn_NUM_m3<-dim(processed_data_m3)[2]
last5Column_NUM_m3<-lastColumn_NUM_m3-5
head(processed_data_m3[,last5Column_NUM_m3 :lastColumn_NUM_m3])
}
if(METHOD_FEATURE_FLAG == 3){
levels(processed_data_m3$DX)
}
## [1] "CI" "CN"
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 4){
df_fs_method4<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 4){
phenotic_features_m4<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m4<-c(phenotic_features_m4,featureName_CpGs)
df_picked_m4<-df_fs_method4[,pickedFeatureName_m4]
df_picked_m4$DX<-as.factor(df_picked_m4$DX)
df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
head(df_picked_m4[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
dim(df_picked_m4)
}
if(METHOD_FEATURE_FLAG == 4){
df_picked_m4<-df_picked_m4 %>% filter(DX != "MCI") %>% droplevels()
df_picked_m4$DX<-as.factor(df_picked_m4$DX)
df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
head(df_picked_m4[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
print(dim(df_picked_m4))
print(table(df_picked_m4$DX))
}
if(METHOD_FEATURE_FLAG == 4){
df_fs_method4 <- df_fs_method4 %>% filter(DX != "MCI") %>% droplevels()
df_fs_method4$DX<-as.factor(df_fs_method4$DX)
print(head(df_fs_method4))
print(dim(df_fs_method4))
}
if(METHOD_FEATURE_FLAG == 4){
pheno_data_m4 <- df_picked_m4[,phenotic_features_m4]
print(head(pheno_data_m4[,1:5],n=3))
design_m4 <- model.matrix(~0 + .,data=pheno_data_m4)
colnames(design_m4)[colnames(design_m4) == "DXCN"] <- "CN"
colnames(design_m4)[colnames(design_m4) == "DXDementia"] <- "Dementia"
print(head(design_m4))
beta_values_m4 <- t(as.matrix(df_fs_method4[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 4, we focus on two groups (CN and Demantia), one contrast of interest.
if(METHOD_FEATURE_FLAG == 4){
fit_m4 <- lmFit(beta_values_m4, design_m4)
head(fit_m4$coefficients)
contrast.matrix <- makeContrasts(Dementia - CN, levels = design_m4)
fit2_m4 <- contrasts.fit(fit_m4, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m4 <- eBayes(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
decideTests(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
dmp_results_m4_try1 <- decideTests(
fit2_m4, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m4_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 4){
# Identify DMPs, we will use this one:
dmp_results_m4 <- decideTests(
fit2_m4, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
significant_dmp_filter <- dmp_results_m4 != 0
significant_cpgs_m4_DMP <- rownames(dmp_results_m4)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m4_afterDMP<-c(phenotic_features_m4,significant_cpgs_m4_DMP)
df_picked_m4<-df_picked_m4[,pickedFeatureName_m4_afterDMP]
dim(df_picked_m4)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 4){
full_results_m4 <- topTable(fit2_m4, number=Inf)
full_results_m4 <- tibble::rownames_to_column(full_results_m4,"ID")
head(full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
sorted_full_results_m4 <- full_results_m4[
order(full_results_m4$logFC, decreasing = TRUE), ]
head(sorted_full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
library(ggplot2)
ggplot(full_results_m4,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 4){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m4 <- full_results_m4 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m4, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 4){
ggplot(full_results_m4,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 4){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m4 <- full_results_m4 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m4,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 4){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m4) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m4)
processed_data_m4 <- bake(rec_prep, new_data = df_picked_m4)
processed_data_m4_df <- as.data.frame(processed_data_m4)
rownames(processed_data_m4_df) <- rownames(df_picked_m4)
print(dim(processed_data_m4))
}
if(METHOD_FEATURE_FLAG == 4){
AfterProcess_FeatureName_m4<-colnames(processed_data_m4)
print(length(AfterProcess_FeatureName_m4))
head(AfterProcess_FeatureName_m4)
tail(AfterProcess_FeatureName_m4)
}
if(METHOD_FEATURE_FLAG == 4){
levels(df_picked_m4$DX)
}
if(METHOD_FEATURE_FLAG == 4){
lastColumn_NUM_m4<-dim(processed_data_m4)[2]
last5Column_NUM_m4<-lastColumn_NUM_m4-5
head(processed_data_m4[,last5Column_NUM_m4 :lastColumn_NUM_m4])
}
if(METHOD_FEATURE_FLAG == 4){
print(levels(processed_data_m4$DX))
print(dim(processed_data_m4))
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 5){
df_fs_method5<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 5){
phenotic_features_m5<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m5<-c(phenotic_features_m5,featureName_CpGs)
df_picked_m5<-df_fs_method5[,pickedFeatureName_m5]
df_picked_m5$DX<-as.factor(df_picked_m5$DX)
df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
head(df_picked_m5[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
dim(df_picked_m5)
}
if(METHOD_FEATURE_FLAG == 5){
df_picked_m5<-df_picked_m5 %>% filter(DX != "Dementia") %>% droplevels()
df_picked_m5$DX<-as.factor(df_picked_m5$DX)
df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
head(df_picked_m5[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
print(dim(df_picked_m5))
print(table(df_picked_m5$DX))
}
if(METHOD_FEATURE_FLAG == 5){
df_fs_method5 <- df_fs_method5 %>% filter(DX != "Dementia") %>% droplevels()
df_fs_method5$DX<-as.factor(df_fs_method5$DX)
print(head(df_fs_method5))
print(dim(df_fs_method5))
}
if(METHOD_FEATURE_FLAG == 5){
pheno_data_m5 <- df_picked_m5[,phenotic_features_m5]
print(head(pheno_data_m5[,1:5],n=3))
design_m5 <- model.matrix(~0 + .,data=pheno_data_m5)
colnames(design_m5)[colnames(design_m5) == "DXCN"] <- "CN"
colnames(design_m5)[colnames(design_m5) == "DXMCI"] <- "MCI"
print(head(design_m5))
beta_values_m5 <- t(as.matrix(df_fs_method5[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 5, we focus on two groups (CN and MCI), one contrast of interest.
if(METHOD_FEATURE_FLAG == 5){
fit_m5 <- lmFit(beta_values_m5, design_m5)
head(fit_m5$coefficients)
contrast.matrix <- makeContrasts(MCI - CN, levels = design_m5)
fit2_m5 <- contrasts.fit(fit_m5, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m5 <- eBayes(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
decideTests(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
dmp_results_m5_try1 <- decideTests(
fit2_m5, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m5_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 5){
# Identify DMPs, we will use this one:
dmp_results_m5 <- decideTests(
fit2_m5, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
significant_dmp_filter <- dmp_results_m5 != 0
significant_cpgs_m5_DMP <- rownames(dmp_results_m5)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m5_afterDMP<-c(phenotic_features_m5,significant_cpgs_m5_DMP)
df_picked_m5<-df_picked_m5[,pickedFeatureName_m5_afterDMP]
dim(df_picked_m5)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 5){
full_results_m5 <- topTable(fit2_m5, number=Inf)
full_results_m5 <- tibble::rownames_to_column(full_results_m5,"ID")
head(full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
sorted_full_results_m5 <- full_results_m5[
order(full_results_m5$logFC, decreasing = TRUE), ]
head(sorted_full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
library(ggplot2)
ggplot(full_results_m5,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 5){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m5 <- full_results_m5 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m5, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 5){
ggplot(full_results_m5,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 5){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m5 <- full_results_m5 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m5,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 5){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m5) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m5)
processed_data_m5 <- bake(rec_prep, new_data = df_picked_m5)
processed_data_m5_df <- as.data.frame(processed_data_m5)
rownames(processed_data_m5_df) <- rownames(df_picked_m5)
print(dim(processed_data_m5))
}
if(METHOD_FEATURE_FLAG == 5){
AfterProcess_FeatureName_m5<-colnames(processed_data_m5)
print(length(AfterProcess_FeatureName_m5))
head(AfterProcess_FeatureName_m5)
tail(AfterProcess_FeatureName_m5)
}
if(METHOD_FEATURE_FLAG == 5){
levels(df_picked_m5$DX)
}
if(METHOD_FEATURE_FLAG == 5){
lastColumn_NUM_m5<-dim(processed_data_m5)[2]
last5Column_NUM_m5<-lastColumn_NUM_m5-5
head(processed_data_m5[,last5Column_NUM_m5 :lastColumn_NUM_m5])
}
if(METHOD_FEATURE_FLAG == 5){
print(levels(processed_data_m5$DX))
print(dim(processed_data_m5))
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 6){
df_fs_method6<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 6){
phenotic_features_m6<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m6<-c(phenotic_features_m6,featureName_CpGs)
df_picked_m6<-df_fs_method6[,pickedFeatureName_m6]
df_picked_m6$DX<-as.factor(df_picked_m6$DX)
df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
head(df_picked_m6[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
dim(df_picked_m6)
}
if(METHOD_FEATURE_FLAG == 6){
df_picked_m6<-df_picked_m6 %>% filter(DX != "CN") %>% droplevels()
df_picked_m6$DX<-as.factor(df_picked_m6$DX)
df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
head(df_picked_m6[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
print(dim(df_picked_m6))
print(table(df_picked_m6$DX))
}
if(METHOD_FEATURE_FLAG == 6){
df_fs_method6 <- df_fs_method6 %>% filter(DX != "CN") %>% droplevels()
df_fs_method6$DX<-as.factor(df_fs_method6$DX)
print(head(df_fs_method6))
print(dim(df_fs_method6))
}
if(METHOD_FEATURE_FLAG == 6){
pheno_data_m6 <- df_picked_m6[,phenotic_features_m6]
print(head(pheno_data_m6[,1:5],n=3))
design_m6 <- model.matrix(~0 + .,data=pheno_data_m6)
colnames(design_m6)[colnames(design_m6) == "DXDementia"] <- "Dementia"
colnames(design_m6)[colnames(design_m6) == "DXMCI"] <- "MCI"
print(head(design_m6))
beta_values_m6 <- t(as.matrix(df_fs_method6[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 6, we focus on two groups (MCI and Dementia), one contrast of interest.
if(METHOD_FEATURE_FLAG == 6){
fit_m6 <- lmFit(beta_values_m6, design_m6)
head(fit_m6$coefficients)
contrast.matrix <- makeContrasts(MCI - Dementia, levels = design_m6)
fit2_m6 <- contrasts.fit(fit_m6, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m6 <- eBayes(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
decideTests(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
dmp_results_m6_try1 <- decideTests(
fit2_m6, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m6_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 6){
# Identify DMPs, we will use this one:
dmp_results_m6 <- decideTests(
fit2_m6, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
significant_dmp_filter <- dmp_results_m6 != 0
significant_cpgs_m6_DMP <- rownames(dmp_results_m6)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m6_afterDMP<-c(phenotic_features_m6,significant_cpgs_m6_DMP)
df_picked_m6<-df_picked_m6[,pickedFeatureName_m6_afterDMP]
dim(df_picked_m6)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 6){
full_results_m6 <- topTable(fit2_m6, number=Inf)
full_results_m6 <- tibble::rownames_to_column(full_results_m6,"ID")
head(full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
sorted_full_results_m6 <- full_results_m6[
order(full_results_m6$logFC, decreasing = TRUE), ]
head(sorted_full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
library(ggplot2)
ggplot(full_results_m6,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 6){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m6 <- full_results_m6 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m6, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 6){
ggplot(full_results_m6,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 6){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m6 <- full_results_m6 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m6,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 6){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m6) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m6)
processed_data_m6 <- bake(rec_prep, new_data = df_picked_m6)
processed_data_m6_df <- as.data.frame(processed_data_m6)
rownames(processed_data_m6_df) <- rownames(df_picked_m6)
print(dim(processed_data_m6))
}
if(METHOD_FEATURE_FLAG == 6){
AfterProcess_FeatureName_m6<-colnames(processed_data_m6)
print(length(AfterProcess_FeatureName_m6))
head(AfterProcess_FeatureName_m6)
tail(AfterProcess_FeatureName_m6)
}
if(METHOD_FEATURE_FLAG == 6){
levels(df_picked_m6$DX)
}
if(METHOD_FEATURE_FLAG == 6){
lastColumn_NUM_m6<-dim(processed_data_m6)[2]
last5Column_NUM_m6<-lastColumn_NUM_m6-5
head(processed_data_m6[,last5Column_NUM_m6 :lastColumn_NUM_m6])
}
if(METHOD_FEATURE_FLAG == 6){
print(levels(processed_data_m6$DX))
print(dim(processed_data_m6))
}
name for “processed_data” could be :
“processed_data_m1”, which uses method one to process the data
“processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.
“processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names.
“processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
“processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
“processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
name for “AfterProcess_FeatureName” (include “DX” label) could be :
if(METHOD_FEATURE_FLAG==1){
processed_dataFrame<-processed_data_m1_df
processed_data<-processed_data_m1
AfterProcess_FeatureName<-AfterProcess_FeatureName_m1
}
if(METHOD_FEATURE_FLAG==2){
processed_dataFrame<-processed_data_m2_df
processed_data<-processed_data_m2
AfterProcess_FeatureName<-AfterProcess_FeatureName_m2
}
if(METHOD_FEATURE_FLAG==3){
processed_dataFrame<-processed_data_m3_df
processed_data<-processed_data_m3
AfterProcess_FeatureName<-AfterProcess_FeatureName_m3
}
if(METHOD_FEATURE_FLAG==4){
processed_dataFrame<-processed_data_m4_df
processed_data<-processed_data_m4
AfterProcess_FeatureName<-AfterProcess_FeatureName_m4
}
if(METHOD_FEATURE_FLAG==5){
processed_dataFrame<-processed_data_m5_df
processed_data<-processed_data_m5
AfterProcess_FeatureName<-AfterProcess_FeatureName_m5
}
if(METHOD_FEATURE_FLAG==6){
processed_dataFrame<-processed_data_m6_df
processed_data<-processed_data_m6
AfterProcess_FeatureName<-AfterProcess_FeatureName_m6
}
print(head(processed_dataFrame))
## age.now PC1 PC2 PC3 cg18993517 cg13573375 cg17002338 cg02621446 cg24470466 cg08896901 cg23916408 cg12146221 cg25174111 cg05234269 cg14293999 cg14307563
## 200223270003_R02C01 82.4 -0.214185447 0.01470293 -0.014043316 0.2091538 0.8670419 0.9286251 0.8731313 0.7725300 0.3581911 0.1942275 0.2049284 0.8526503 0.93848584 0.2836710 0.1855966
## 200223270003_R03C01 78.6 -0.172761185 0.05745834 0.005055871 0.2665896 0.1733934 0.2684163 0.8095534 0.9041432 0.2467071 0.9154993 0.1814927 0.8573844 0.57461229 0.9172023 0.8916957
## 200223270003_R06C01 80.4 -0.003667305 0.08372861 0.029143653 0.2574003 0.8888246 0.2811103 0.7511582 0.1206738 0.9225209 0.8886255 0.8619250 0.2567745 0.02467208 0.9168166 0.8750052
## cg21209485 cg11331837 cg11187460 cg13653328 cg09451339 cg06961873 cg23159970 cg10788927 cg05392160 cg04540199 cg01608425 cg18285382 cg24851651 cg22071943 cg24643105 cg03549208
## 200223270003_R02C01 0.8865053 0.03692842 0.03672179 0.9245434 0.2243746 0.5335591 0.61817246 0.8973154 0.9328933 0.8165865 0.9030410 0.3202927 0.03674702 0.8705217 0.5303418 0.9014487
## 200223270003_R03C01 0.8714878 0.57150125 0.92516409 0.5122938 0.2340702 0.5472606 0.57492600 0.2021398 0.2576881 0.7964195 0.9264388 0.2930577 0.05358297 0.2442648 0.5042688 0.8381784
## 200223270003_R06C01 0.2292550 0.03182862 0.03109553 0.9362798 0.8921284 0.9415177 0.03288909 0.2053075 0.8920726 0.4698047 0.8887753 0.8923595 0.05968923 0.2644581 0.9383050 0.9097817
## cg17653352 cg14252149 cg04831745 cg13372276 cg07640670 cg25879395 cg05593887 cg18339359 cg08283200 cg21783012 cg03221390 cg05321907 cg10985055 cg15912814 cg03327352 cg20300784
## 200223270003_R02C01 0.9269778 0.02455407 0.61984995 0.04888111 0.58296513 0.88130864 0.5939220 0.8824858 0.8831085 0.9142369 0.5859063 0.2880477 0.8518169 0.8342997 0.8851712 0.86585964
## 200223270003_R03C01 0.9086951 0.02450779 0.71214149 0.62396373 0.55225610 0.02603438 0.5766550 0.9040272 0.2652269 0.6694884 0.9180706 0.1782629 0.8631895 0.8673032 0.8786878 0.86609999
## 200223270003_R06C01 0.9341775 0.02382413 0.06871768 0.59693465 0.04058533 0.91060615 0.9148338 0.8552121 0.8935829 0.9070112 0.6399867 0.8427929 0.5456633 0.8455862 0.3042310 0.03091187
## cg16733676 cg05130642 cg04462915 cg25649515 cg22169467 cg00553601 cg03749159 cg03088219 cg13038195 cg17738613 cg02772171 cg12240569 cg06012903 cg14192979 cg17906851 cg01933473
## 200223270003_R02C01 0.9057228 0.8575504 0.03224861 0.9279829 0.3095010 0.05601299 0.9355921 0.844002862 0.45882213 0.6879612 0.9182018 0.82772064 0.7964595 0.06336040 0.9488392 0.2589014
## 200223270003_R03C01 0.8904541 0.8644077 0.50740695 0.9235753 0.2978585 0.58957701 0.9153921 0.007435243 0.02740132 0.6582258 0.5660559 0.02690547 0.1933431 0.06019651 0.9529718 0.6726133
## 200223270003_R06C01 0.1698111 0.3661324 0.02700644 0.5895839 0.8955853 0.62426500 0.9255807 0.120155222 0.46284376 0.1022257 0.8995479 0.46030640 0.1960773 0.52114282 0.6462151 0.2642560
## cg16089727 cg18857647 cg00767423 cg08041188 cg03737947 cg26679884 cg22305850 cg26846609 cg04867412 cg02823329 cg00295418 cg21139150 cg25059696 cg21388339 cg13815695 cg04888234
## 200223270003_R02C01 0.86748697 0.8582332 0.9298253 0.7752456 0.91824910 0.6793815 0.03361934 0.48860949 0.04304823 0.9462397 0.44954665 0.01853264 0.9017504 0.2756268 0.9267057 0.8379655
## 200223270003_R03C01 0.54996692 0.8394132 0.2651854 0.3201255 0.92067153 0.1848705 0.57522232 0.04878986 0.87967997 0.6464005 0.48471295 0.43223243 0.3047156 0.2102269 0.6859729 0.4376314
## 200223270003_R06C01 0.05876736 0.2647491 0.8667808 0.7900939 0.03638091 0.1701734 0.58548744 0.48026945 0.44971146 0.9633930 0.02004532 0.43772680 0.3051179 0.7649181 0.6509046 0.8039047
## cg08096656 cg14780448 cg17723206 cg03600007 cg11438323 cg00322820 cg15535896 cg18698799 cg12501287 cg01462799 cg10738648 cg23836570 cg09785377 cg16536985 cg02122327 cg12784167
## 200223270003_R02C01 0.9362594 0.9119141 0.92881042 0.5658487 0.4863471 0.4869764 0.3382952 0.70099633 0.4654925 0.8284427 0.44931577 0.58688450 0.9162088 0.5789643 0.38940091 0.81503498
## 200223270003_R03C01 0.9314878 0.6702102 0.48556255 0.6018832 0.8984559 0.4858988 0.9253926 0.05812989 0.5126917 0.4038824 0.49894016 0.54259383 0.9226292 0.5418687 0.37769608 0.02811410
## 200223270003_R06C01 0.4943033 0.6207355 0.01765023 0.8611166 0.8722772 0.4754313 0.3320191 0.06957486 0.9189144 0.4676821 0.05552024 0.03267304 0.6405193 0.8392044 0.04017909 0.03073269
## cg15633912 cg02495179 cg19471911 cg20823859 cg02078724 cg04242342 cg20981163 cg00345083 cg09247979 cg02246922 cg20566384 cg25436480 cg06483046 cg02550738 cg01008088 cg20078646
## 200223270003_R02C01 0.1605530 0.6813307 0.6334393 0.9030711 0.3096774 0.8206769 0.8990628 0.47960968 0.5070956 0.7301201 0.06000262 0.8425160 0.04383925 0.6201457 0.8424817 0.06198170
## 200223270003_R03C01 0.9333421 0.7373055 0.8437175 0.6062985 0.2896133 0.8167892 0.9264076 0.50833875 0.5706177 0.9447019 0.62206350 0.4994032 0.50720277 0.9011727 0.2417656 0.89537412
## 200223270003_R06C01 0.8737362 0.5588114 0.6127952 0.8917348 0.2805612 0.8040357 0.4874651 0.03929249 0.5090215 0.7202230 0.89269664 0.3494312 0.89604910 0.9085849 0.2618620 0.08725521
## cg11268585 cg06864789 cg04316537 cg27224751 cg00939409 cg26983017 cg15184869 cg06403901 cg13387643 cg04768387 cg17268094 cg01128042 cg14507637 cg16202259 cg19799454 cg08198851
## 200223270003_R02C01 0.2521544 0.05369415 0.8074830 0.44503947 0.2652180 0.89868232 0.8622328 0.92790690 0.4229959 0.3131047 0.5774753 0.9113420 0.9051258 0.9548726 0.9178930 0.6578905
## 200223270003_R03C01 0.8535791 0.46053125 0.8453340 0.03214912 0.8882671 0.03145466 0.8996252 0.04783341 0.4200273 0.9465814 0.9003262 0.5328806 0.9009460 0.3713483 0.9106247 0.6578186
## 200223270003_R06C01 0.9121931 0.87513655 0.4351695 0.83123722 0.8842646 0.84677625 0.8688117 0.05253626 0.4161488 0.9098563 0.8789368 0.5222757 0.9013686 0.4852461 0.9066551 0.1272153
## cg05891136 cg04412904 cg11227702 cg18150287 cg12333628 cg14168080 cg27160885 cg05161773 cg25306893 cg14181112 cg02932958 cg00962106 cg08745107 cg01662749 cg11286989 cg15775217
## 200223270003_R02C01 0.7797403 0.05088595 0.86486075 0.7685695 0.9227884 0.4190123 0.2231606 0.4120912 0.6265392 0.7043545 0.7901008 0.9124898 0.02921338 0.3506201 0.7590008 0.5707441
## 200223270003_R03C01 0.3310206 0.07717659 0.49184121 0.7519166 0.9092861 0.4420256 0.8263885 0.4154907 0.8330282 0.1615405 0.4210489 0.5375751 0.78542320 0.2510946 0.8533989 0.9168327
## 200223270003_R06C01 0.7965298 0.08253743 0.02543724 0.2501173 0.5084647 0.4355521 0.2121179 0.8526849 0.6175380 0.3424621 0.3825995 0.5040948 0.02709928 0.8061480 0.7313884 0.6042521
## cg24139837 cg04645024 cg01280698 cg11314779 cg21697769 cg13739190 cg12543766 cg09120722 cg27070288 cg16715186 cg00696044 cg00084271 cg24883219 cg02627240 cg20673830 cg08788093
## 200223270003_R02C01 0.07404605 0.7366541 0.8985067 0.0242134 0.8946108 0.8510103 0.51028134 0.5878977 0.7721937 0.2742789 0.55608424 0.8103611 0.6430473 0.66706843 0.2422052 0.03911678
## 200223270003_R03C01 0.04183445 0.8454827 0.8846201 0.8966100 0.2822953 0.8358482 0.88741539 0.8287506 0.8584529 0.7946153 0.07552381 0.7877006 0.6822115 0.57129408 0.6881735 0.60934160
## 200223270003_R06C01 0.05657120 0.0871902 0.8847132 0.8908661 0.8698740 0.8419471 0.02818501 0.8793344 0.8634018 0.8124316 0.79270858 0.7706165 0.5296903 0.05309659 0.2134634 0.88380243
## cg07951602 cg02389264 cg15586958 cg01153376 cg15600437 cg23352245 cg22542451 cg10701746 cg17386240 cg11540596 cg22666875 cg04156077 cg23177161 cg00648024 cg10681981 cg02668233
## 200223270003_R02C01 0.8842206 0.7813213 0.9058263 0.4872148 0.4885353 0.9377232 0.5884356 0.4795503 0.7473400 0.9238951 0.8177182 0.7321883 0.4151698 0.51410972 0.7035090 0.4708431
## 200223270003_R03C01 0.8766842 0.7900942 0.8957526 0.9639670 0.4894487 0.9375774 0.8337068 0.4868342 0.7144809 0.8926595 0.8291957 0.6865805 0.4586576 0.40202875 0.7382662 0.8841930
## 200223270003_R06C01 0.8918089 0.7789974 0.9121763 0.2242410 0.8551374 0.5932742 0.8125084 0.4927257 0.8074824 0.8820252 0.3694180 0.8501188 0.8287312 0.05579011 0.6971989 0.4575646
## cg11706829 cg02356645 cg00146240 cg24307368 cg04497611 cg24697433 cg18949721 cg07480955 cg12556569 cg22931151 cg14532717 cg13226272 cg08584917 cg06394820 cg17131279 cg07138269
## 200223270003_R02C01 0.8897234 0.5105903 0.6336151 0.64323677 0.9086359 0.9243095 0.2334245 0.3874638 0.06218231 0.9311023 0.5732280 0.02637249 0.5663205 0.8513195 0.1900637 0.5002290
## 200223270003_R03C01 0.5444785 0.5833923 0.8957183 0.34980461 0.8818513 0.6808390 0.2437792 0.3916889 0.03924599 0.9356702 0.1107638 0.54100016 0.9019732 0.8695521 0.7048637 0.9426707
## 200223270003_R06C01 0.5669449 0.5701428 0.1433218 0.02720398 0.5853116 0.6384606 0.2523095 0.4043390 0.48636893 0.9328614 0.6273416 0.44370701 0.9187789 0.4415020 0.1492861 0.5057781
## cg26081710 cg25758034 cg22112152 cg19301366 cg00819121 cg10091792 cg21507367 cg16779438 cg14710850 cg06118351 cg11019791 cg01910713 cg22535849 cg21757617 cg08857872 cg20678988
## 200223270003_R02C01 0.8751040 0.6114028 0.8476101 0.8831393 0.9207001 0.8670733 0.9268560 0.8826150 0.8048592 0.3633940 0.8112324 0.8573169 0.8847704 0.03652647 0.3395280 0.8438718
## 200223270003_R03C01 0.9198212 0.6649219 0.8014136 0.8072679 0.9281472 0.5864221 0.9290102 0.5466924 0.8090950 0.4714860 0.7831231 0.8538850 0.8609966 0.44299089 0.8181845 0.8548886
## 200223270003_R06C01 0.8801892 0.2393844 0.7897897 0.8796022 0.9327211 0.6087997 0.9039559 0.8629492 0.8285902 0.8655962 0.4353250 0.8110366 0.8808022 0.44725379 0.2970779 0.7786685
## cg16431720 cg02887598 cg16858433 cg12702014 cg01921484 cg00415024 cg16338321 cg12776173 cg18029737 cg02643260 cg25712921 cg03084184 cg04124201 cg01549082 cg26948066 cg09015880
## 200223270003_R02C01 0.7356099 0.04020908 0.9184356 0.7704049 0.9098550 0.4299553 0.5350242 0.1038804 0.9100454 0.8580487 0.2829848 0.8162981 0.8686421 0.2924138 0.4685225 0.5101716
## 200223270003_R03C01 0.8692449 0.67073881 0.9194211 0.7848681 0.9093137 0.3999122 0.8294062 0.8730635 0.9016634 0.8288883 0.6220919 0.7877128 0.3308589 0.7065693 0.5026045 0.8402106
## 200223270003_R06C01 0.8773137 0.73408417 0.9271632 0.8065993 0.9204487 0.7465084 0.4918708 0.7009491 0.7376586 0.8664623 0.6384003 0.4546397 0.3241613 0.2895440 0.9101976 0.8472063
## cg11133939 cg15700429 cg25277809 cg12421087 cg24634455 cg03359067 cg02225060 cg25169289 cg00512739 cg04109990 cg13368637 cg12279734 cg23066280 cg06880438 cg10666341 cg10240127
## 200223270003_R02C01 0.1282694 0.7879010 0.1632342 0.5647607 0.7796391 0.7998055 0.6828159 0.1100884 0.9337648 0.9014696 0.5597507 0.6435368 0.07247841 0.8285145 0.9046648 0.9250553
## 200223270003_R03C01 0.5920898 0.9114530 0.4913711 0.5399655 0.5188241 0.8628564 0.8265195 0.7667174 0.8863895 0.6476604 0.9100088 0.1494651 0.57174588 0.7988881 0.6731062 0.9403255
## 200223270003_R06C01 0.5127706 0.8838233 0.5952124 0.5400348 0.5325725 0.8144536 0.5209552 0.2264993 0.9242748 0.6692040 0.8739205 0.8760759 0.80814756 0.7839538 0.6443180 0.9056974
## cg23432430 cg16652920 cg12228670 cg19503462 cg07028768 cg26853071 cg06277607 cg11787167 cg17296678 cg06960717 cg00086247 cg09584650 cg27272246 cg10738049 cg12689021 cg21986118
## 200223270003_R02C01 0.9482702 0.9436000 0.8632174 0.7951675 0.4496851 0.4233820 0.10744587 0.03853894 0.8262635 0.7030978 0.1761275 0.08230254 0.8615873 0.5441211 0.7706828 0.6658175
## 200223270003_R03C01 0.9455418 0.9431222 0.8496212 0.4537684 0.8536078 0.7451354 0.09353494 0.04673831 0.5653917 0.7653402 0.2045043 0.09661586 0.8705287 0.5232715 0.7449475 0.6571296
## 200223270003_R06C01 0.9418716 0.9457161 0.8738949 0.6997359 0.8356936 0.4228079 0.09504696 0.32564508 0.5272971 0.7206218 0.6901217 0.52399749 0.8103777 0.4875473 0.7872237 0.7034445
## cg20208879 cg22741595 cg05850457 cg04664583 cg09216282 cg03982462 cg05064044 cg06715136 cg20803293 cg15501526 cg06833284 cg16571124 cg07158503 cg06371647 cg17671604 cg14175932
## 200223270003_R02C01 0.66986658 0.6525533 0.8183013 0.5572814 0.9349248 0.8562777 0.5672851 0.3400192 0.54933918 0.6362531 0.9125144 0.9282854 0.5777146 0.8336894 0.3134752 0.5746953
## 200223270003_R03C01 0.02423079 0.1730013 0.8313023 0.5881190 0.9244259 0.6023731 0.5358875 0.9259109 0.07935747 0.6319253 0.9003482 0.9206431 0.6203543 0.8198684 0.6325735 0.8779027
## 200223270003_R06C01 0.61769424 0.1550739 0.8161364 0.9352717 0.9263996 0.8778458 0.5273964 0.9079807 0.42191244 0.7435100 0.6097933 0.9276842 0.6236025 0.8069537 0.7054536 0.7288239
## cg03979311 cg15730644 cg18819889 cg25208881 cg08861434 cg04718469 cg17002719 cg17429539 cg08554146 cg00322003 cg14170504 cg07504457 cg18526121 cg11247378 cg05876883 cg23517115
## 200223270003_R02C01 0.86644909 0.4803181 0.9156157 0.1851956 0.8768306 0.8687522 0.04939181 0.7860900 0.8982080 0.1759911 0.54915621 0.7116230 0.4519781 0.1591185 0.9039064 0.2151144
## 200223270003_R03C01 0.06199853 0.4353906 0.9004455 0.9092286 0.4352647 0.7256813 0.40466475 0.7100923 0.8963074 0.5702070 0.02236650 0.6854539 0.4762313 0.7874849 0.9223308 0.9131440
## 200223270003_R06C01 0.72615553 0.8763048 0.9054439 0.9265502 0.8698813 0.8521881 0.51428089 0.7660838 0.8213878 0.3077122 0.02988245 0.7205633 0.4833367 0.4807942 0.4697980 0.8328364
## cg13405878 cg03672288 cg18816397 cg14687298 cg14627380 cg10864200 cg00154902 cg15098922 cg15985500 cg26901661 cg10039445 cg00004073 cg07634717 cg27452255 cg06697310 cg02631626
## 200223270003_R02C01 0.4549662 0.9235592 0.5472925 0.04206702 0.9455369 0.7380052 0.5137741 0.9286092 0.8555262 0.8951971 0.8833873 0.02928535 0.7483382 0.9001010 0.8454609 0.6280766
## 200223270003_R03C01 0.7858042 0.6718625 0.4940355 0.14813581 0.9258964 0.7421384 0.8540746 0.9027517 0.8312198 0.8754981 0.8954055 0.02787198 0.8254434 0.6593379 0.8653044 0.1951736
## 200223270003_R06C01 0.7583938 0.9007629 0.5337018 0.24260002 0.5789898 0.5945457 0.8188126 0.8525611 0.8492103 0.9021064 0.8832807 0.64576463 0.8181246 0.9012217 0.2405168 0.2699849
## cg17129965 cg06231502 cg09727210 cg18918831 cg21243064 cg27577781 cg20685672 cg03660162 cg17042243 DX
## 200223270003_R02C01 0.8972140 0.7784451 0.4240111 0.4891660 0.5191606 0.8143535 0.6712101 0.8691767 0.2502905 CI
## 200223270003_R03C01 0.8806673 0.7964278 0.8812928 0.5333801 0.9167649 0.8113185 0.7932091 0.5160770 0.2933475 CN
## 200223270003_R06C01 0.8857237 0.7706160 0.8493743 0.6406575 0.4862205 0.8144274 0.6613646 0.9026304 0.2725457 CN
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
print(dim(processed_dataFrame))
## [1] 648 314
print(length(AfterProcess_FeatureName))
## [1] 314
print(head(processed_data))
## # A tibble: 6 × 314
## age.now PC1 PC2 PC3 cg18993517 cg13573375 cg17002338 cg02621446 cg24470466 cg08896901 cg23916408 cg12146221 cg25174111 cg05234269 cg14293999 cg14307563 cg21209485 cg11331837
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 82.4 -0.214 0.0147 -0.0140 0.209 0.867 0.929 0.873 0.773 0.358 0.194 0.205 0.853 0.938 0.284 0.186 0.887 0.0369
## 2 78.6 -0.173 0.0575 0.00506 0.267 0.173 0.268 0.810 0.904 0.247 0.915 0.181 0.857 0.575 0.917 0.892 0.871 0.572
## 3 80.4 -0.00367 0.0837 0.0291 0.257 0.889 0.281 0.751 0.121 0.923 0.889 0.862 0.257 0.0247 0.917 0.875 0.229 0.0318
## 4 78.2 -0.187 -0.0112 -0.0323 0.0945 0.131 0.271 0.877 0.927 0.342 0.887 0.124 0.190 0.565 0.919 0.898 0.235 0.0383
## 5 62.9 0.0268 0.0000165 0.0529 0.940 0.161 0.880 0.205 0.190 0.924 0.222 0.202 0.267 0.948 0.197 0.876 0.888 0.930
## 6 80.7 -0.0379 0.0157 -0.00869 0.950 0.851 0.931 0.796 0.207 0.264 0.152 0.138 0.205 0.563 0.903 0.917 0.229 0.540
## # ℹ 296 more variables: cg11187460 <dbl>, cg13653328 <dbl>, cg09451339 <dbl>, cg06961873 <dbl>, cg23159970 <dbl>, cg10788927 <dbl>, cg05392160 <dbl>, cg04540199 <dbl>, cg01608425 <dbl>,
## # cg18285382 <dbl>, cg24851651 <dbl>, cg22071943 <dbl>, cg24643105 <dbl>, cg03549208 <dbl>, cg17653352 <dbl>, cg14252149 <dbl>, cg04831745 <dbl>, cg13372276 <dbl>, cg07640670 <dbl>,
## # cg25879395 <dbl>, cg05593887 <dbl>, cg18339359 <dbl>, cg08283200 <dbl>, cg21783012 <dbl>, cg03221390 <dbl>, cg05321907 <dbl>, cg10985055 <dbl>, cg15912814 <dbl>, cg03327352 <dbl>,
## # cg20300784 <dbl>, cg16733676 <dbl>, cg05130642 <dbl>, cg04462915 <dbl>, cg25649515 <dbl>, cg22169467 <dbl>, cg00553601 <dbl>, cg03749159 <dbl>, cg03088219 <dbl>, cg13038195 <dbl>,
## # cg17738613 <dbl>, cg02772171 <dbl>, cg12240569 <dbl>, cg06012903 <dbl>, cg14192979 <dbl>, cg17906851 <dbl>, cg01933473 <dbl>, cg16089727 <dbl>, cg18857647 <dbl>, cg00767423 <dbl>,
## # cg08041188 <dbl>, cg03737947 <dbl>, cg26679884 <dbl>, cg22305850 <dbl>, cg26846609 <dbl>, cg04867412 <dbl>, cg02823329 <dbl>, cg00295418 <dbl>, cg21139150 <dbl>, cg25059696 <dbl>,
## # cg21388339 <dbl>, cg13815695 <dbl>, cg04888234 <dbl>, cg08096656 <dbl>, cg14780448 <dbl>, cg17723206 <dbl>, cg03600007 <dbl>, cg11438323 <dbl>, cg00322820 <dbl>, cg15535896 <dbl>, …
print(dim(processed_data))
## [1] 648 314
print(AfterProcess_FeatureName)
## [1] "age.now" "PC1" "PC2" "PC3" "cg18993517" "cg13573375" "cg17002338" "cg02621446" "cg24470466" "cg08896901" "cg23916408" "cg12146221" "cg25174111" "cg05234269" "cg14293999"
## [16] "cg14307563" "cg21209485" "cg11331837" "cg11187460" "cg13653328" "cg09451339" "cg06961873" "cg23159970" "cg10788927" "cg05392160" "cg04540199" "cg01608425" "cg18285382" "cg24851651" "cg22071943"
## [31] "cg24643105" "cg03549208" "cg17653352" "cg14252149" "cg04831745" "cg13372276" "cg07640670" "cg25879395" "cg05593887" "cg18339359" "cg08283200" "cg21783012" "cg03221390" "cg05321907" "cg10985055"
## [46] "cg15912814" "cg03327352" "cg20300784" "cg16733676" "cg05130642" "cg04462915" "cg25649515" "cg22169467" "cg00553601" "cg03749159" "cg03088219" "cg13038195" "cg17738613" "cg02772171" "cg12240569"
## [61] "cg06012903" "cg14192979" "cg17906851" "cg01933473" "cg16089727" "cg18857647" "cg00767423" "cg08041188" "cg03737947" "cg26679884" "cg22305850" "cg26846609" "cg04867412" "cg02823329" "cg00295418"
## [76] "cg21139150" "cg25059696" "cg21388339" "cg13815695" "cg04888234" "cg08096656" "cg14780448" "cg17723206" "cg03600007" "cg11438323" "cg00322820" "cg15535896" "cg18698799" "cg12501287" "cg01462799"
## [91] "cg10738648" "cg23836570" "cg09785377" "cg16536985" "cg02122327" "cg12784167" "cg15633912" "cg02495179" "cg19471911" "cg20823859" "cg02078724" "cg04242342" "cg20981163" "cg00345083" "cg09247979"
## [106] "cg02246922" "cg20566384" "cg25436480" "cg06483046" "cg02550738" "cg01008088" "cg20078646" "cg11268585" "cg06864789" "cg04316537" "cg27224751" "cg00939409" "cg26983017" "cg15184869" "cg06403901"
## [121] "cg13387643" "cg04768387" "cg17268094" "cg01128042" "cg14507637" "cg16202259" "cg19799454" "cg08198851" "cg05891136" "cg04412904" "cg11227702" "cg18150287" "cg12333628" "cg14168080" "cg27160885"
## [136] "cg05161773" "cg25306893" "cg14181112" "cg02932958" "cg00962106" "cg08745107" "cg01662749" "cg11286989" "cg15775217" "cg24139837" "cg04645024" "cg01280698" "cg11314779" "cg21697769" "cg13739190"
## [151] "cg12543766" "cg09120722" "cg27070288" "cg16715186" "cg00696044" "cg00084271" "cg24883219" "cg02627240" "cg20673830" "cg08788093" "cg07951602" "cg02389264" "cg15586958" "cg01153376" "cg15600437"
## [166] "cg23352245" "cg22542451" "cg10701746" "cg17386240" "cg11540596" "cg22666875" "cg04156077" "cg23177161" "cg00648024" "cg10681981" "cg02668233" "cg11706829" "cg02356645" "cg00146240" "cg24307368"
## [181] "cg04497611" "cg24697433" "cg18949721" "cg07480955" "cg12556569" "cg22931151" "cg14532717" "cg13226272" "cg08584917" "cg06394820" "cg17131279" "cg07138269" "cg26081710" "cg25758034" "cg22112152"
## [196] "cg19301366" "cg00819121" "cg10091792" "cg21507367" "cg16779438" "cg14710850" "cg06118351" "cg11019791" "cg01910713" "cg22535849" "cg21757617" "cg08857872" "cg20678988" "cg16431720" "cg02887598"
## [211] "cg16858433" "cg12702014" "cg01921484" "cg00415024" "cg16338321" "cg12776173" "cg18029737" "cg02643260" "cg25712921" "cg03084184" "cg04124201" "cg01549082" "cg26948066" "cg09015880" "cg11133939"
## [226] "cg15700429" "cg25277809" "cg12421087" "cg24634455" "cg03359067" "cg02225060" "cg25169289" "cg00512739" "cg04109990" "cg13368637" "cg12279734" "cg23066280" "cg06880438" "cg10666341" "cg10240127"
## [241] "cg23432430" "cg16652920" "cg12228670" "cg19503462" "cg07028768" "cg26853071" "cg06277607" "cg11787167" "cg17296678" "cg06960717" "cg00086247" "cg09584650" "cg27272246" "cg10738049" "cg12689021"
## [256] "cg21986118" "cg20208879" "cg22741595" "cg05850457" "cg04664583" "cg09216282" "cg03982462" "cg05064044" "cg06715136" "cg20803293" "cg15501526" "cg06833284" "cg16571124" "cg07158503" "cg06371647"
## [271] "cg17671604" "cg14175932" "cg03979311" "cg15730644" "cg18819889" "cg25208881" "cg08861434" "cg04718469" "cg17002719" "cg17429539" "cg08554146" "cg00322003" "cg14170504" "cg07504457" "cg18526121"
## [286] "cg11247378" "cg05876883" "cg23517115" "cg13405878" "cg03672288" "cg18816397" "cg14687298" "cg14627380" "cg10864200" "cg00154902" "cg15098922" "cg15985500" "cg26901661" "cg10039445" "cg00004073"
## [301] "cg07634717" "cg27452255" "cg06697310" "cg02631626" "cg17129965" "cg06231502" "cg09727210" "cg18918831" "cg21243064" "cg27577781" "cg20685672" "cg03660162" "cg17042243" "DX"
print("Number of Features :")
## [1] "Number of Features :"
Num_feaForProcess = length(AfterProcess_FeatureName)-1 # exclude the "DX" label
print(Num_feaForProcess)
## [1] 313
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123) # for reproducibility
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 314
dim(testData)
## [1] 194 314
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_modelTrain_LRM1 <- caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 123 19
## CN 5 47
##
## Accuracy : 0.8763
## 95% CI : (0.8215, 0.9191)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 4.641e-12
##
## Kappa : 0.7095
##
## Mcnemar's Test P-Value : 0.007963
##
## Sensitivity : 0.9609
## Specificity : 0.7121
## Pos Pred Value : 0.8662
## Neg Pred Value : 0.9038
## Prevalence : 0.6598
## Detection Rate : 0.6340
## Detection Prevalence : 0.7320
## Balanced Accuracy : 0.8365
##
## 'Positive' Class : CI
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_modelTrain_LRM1_Accuracy<-cm_modelTrain_LRM1$overall["Accuracy"]
cm_modelTrain_LRM1_Kappa<-cm_modelTrain_LRM1$overall["Kappa"]
print(cm_modelTrain_LRM1_Accuracy)
## Accuracy
## 0.8762887
print(cm_modelTrain_LRM1_Kappa)
## Kappa
## 0.7095084
print(model_LRM1)
## glmnet
##
## 454 samples
## 313 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001769938 0.7599023 0.4526141
## 0.10 0.0017699384 0.7686691 0.4684706
## 0.10 0.0176993845 0.7707937 0.4674205
## 0.55 0.0001769938 0.7466911 0.4270706
## 0.55 0.0017699384 0.7334554 0.4012873
## 0.55 0.0176993845 0.6958974 0.2927531
## 1.00 0.0001769938 0.7158730 0.3658615
## 1.00 0.0017699384 0.7202198 0.3706121
## 1.00 0.0176993845 0.6541392 0.1764299
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997797356828194"
modelTrain_LRM1_trainAccuracy<-train_accuracy
print(modelTrain_LRM1_trainAccuracy)
## [1] 0.9977974
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
modelTrain_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(modelTrain_mean_accuracy_cv_LRM1)
## [1] 0.7295157
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6 ){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9149
## [1] "The auc value is:"
## Area under the curve: 0.9149
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_LRM1_AUC <- mean_auc
}
print(modelTrain_LRM1_AUC)
## Area under the curve: 0.9149
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 313)
##
## Overall
## PC3 100.00
## PC1 66.84
## cg23432430 49.20
## cg09727210 47.85
## PC2 43.08
## cg00962106 41.16
## cg07158503 40.42
## cg06697310 40.25
## cg02225060 35.49
## cg09015880 35.48
## cg10701746 34.84
## cg16338321 33.99
## cg00819121 32.46
## cg26081710 32.36
## cg00415024 31.28
## cg21757617 30.74
## cg14168080 30.58
## cg02887598 29.89
## cg05064044 29.82
## cg01910713 28.88
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 3.6285180181
## 2 2.4253396315
## 3 1.7851949468
## 4 1.7363184604
## 5 1.5632207146
## 6 1.4936497094
## 7 1.4664812319
## 8 1.4603980581
## 9 1.2878873558
## 10 1.2875537792
## 11 1.2641672553
## 12 1.2333618130
## 13 1.1778382769
## 14 1.1740457452
## 15 1.1348195426
## 16 1.1153125101
## 17 1.1097799080
## 18 1.0847220785
## 19 1.0818839266
## 20 1.0478938648
## 21 1.0235490351
## 22 0.9939673711
## 23 0.9898860123
## 24 0.9884952187
## 25 0.9773395833
## 26 0.9744874223
## 27 0.9724050409
## 28 0.9549356376
## 29 0.9440677374
## 30 0.9287982221
## 31 0.9219776406
## 32 0.9111355524
## 33 0.9016094080
## 34 0.8934737229
## 35 0.8860340273
## 36 0.8860249092
## 37 0.8838850611
## 38 0.8803408500
## 39 0.8800125946
## 40 0.8718597919
## 41 0.8577624329
## 42 0.8567848192
## 43 0.8564251541
## 44 0.8453990008
## 45 0.8121199754
## 46 0.8112721045
## 47 0.8111852845
## 48 0.8030267999
## 49 0.7975741288
## 50 0.7843676142
## 51 0.7836162409
## 52 0.7831374337
## 53 0.7813426107
## 54 0.7733647514
## 55 0.7673942532
## 56 0.7660677097
## 57 0.7624088722
## 58 0.7588005343
## 59 0.7539736771
## 60 0.7469764730
## 61 0.7405239848
## 62 0.7393544366
## 63 0.7392058659
## 64 0.7354194153
## 65 0.7330058682
## 66 0.7244382375
## 67 0.7155997383
## 68 0.7137847718
## 69 0.7127896206
## 70 0.7114462469
## 71 0.7074761994
## 72 0.7027883523
## 73 0.7018471550
## 74 0.7001187138
## 75 0.6951345401
## 76 0.6920625399
## 77 0.6900427180
## 78 0.6840843379
## 79 0.6830927769
## 80 0.6727313853
## 81 0.6709406295
## 82 0.6698572659
## 83 0.6690248966
## 84 0.6636477129
## 85 0.6613941167
## 86 0.6585728678
## 87 0.6540525980
## 88 0.6536478883
## 89 0.6462598537
## 90 0.6418569758
## 91 0.6399370459
## 92 0.6389031042
## 93 0.6313143750
## 94 0.6185535294
## 95 0.6155716177
## 96 0.6132356206
## 97 0.5954230466
## 98 0.5796252549
## 99 0.5783326900
## 100 0.5724848914
## 101 0.5721418795
## 102 0.5712297528
## 103 0.5698630147
## 104 0.5616271697
## 105 0.5572482303
## 106 0.5562914968
## 107 0.5557055468
## 108 0.5532126081
## 109 0.5522476854
## 110 0.5486044693
## 111 0.5478097923
## 112 0.5420141433
## 113 0.5366864186
## 114 0.5339543511
## 115 0.5318657316
## 116 0.5293483054
## 117 0.5256715439
## 118 0.5200995234
## 119 0.5174148906
## 120 0.5089563626
## 121 0.5075095859
## 122 0.4958552201
## 123 0.4953552358
## 124 0.4940180438
## 125 0.4926335176
## 126 0.4873539274
## 127 0.4857238005
## 128 0.4810508313
## 129 0.4759225008
## 130 0.4721732800
## 131 0.4717239832
## 132 0.4689932973
## 133 0.4645604499
## 134 0.4643487070
## 135 0.4552771848
## 136 0.4535184957
## 137 0.4534578477
## 138 0.4520467343
## 139 0.4469529838
## 140 0.4353378524
## 141 0.4291125154
## 142 0.4094553361
## 143 0.4052509206
## 144 0.4010719173
## 145 0.3978202403
## 146 0.3913548275
## 147 0.3910537900
## 148 0.3844388051
## 149 0.3841012802
## 150 0.3786766290
## 151 0.3765864520
## 152 0.3700379275
## 153 0.3689803068
## 154 0.3687736920
## 155 0.3623862375
## 156 0.3608124030
## 157 0.3564715816
## 158 0.3555823088
## 159 0.3555498481
## 160 0.3536609481
## 161 0.3529640491
## 162 0.3512288024
## 163 0.3507942437
## 164 0.3495106419
## 165 0.3471227963
## 166 0.3467133316
## 167 0.3451423838
## 168 0.3392448510
## 169 0.3369606343
## 170 0.3357596603
## 171 0.3345611286
## 172 0.3278947697
## 173 0.3249097890
## 174 0.3214712831
## 175 0.3167455763
## 176 0.3154459662
## 177 0.3137431181
## 178 0.3129416136
## 179 0.3090541860
## 180 0.3069833919
## 181 0.3067290175
## 182 0.3018606811
## 183 0.3018152722
## 184 0.3013511367
## 185 0.2992311366
## 186 0.2992306420
## 187 0.2990332615
## 188 0.2957057531
## 189 0.2950066591
## 190 0.2949070039
## 191 0.2897976460
## 192 0.2893166430
## 193 0.2848487367
## 194 0.2781623771
## 195 0.2775592833
## 196 0.2771259031
## 197 0.2727963026
## 198 0.2719936801
## 199 0.2719389468
## 200 0.2705848712
## 201 0.2625679457
## 202 0.2513507031
## 203 0.2468680134
## 204 0.2468009208
## 205 0.2464045331
## 206 0.2459980563
## 207 0.2450402791
## 208 0.2443706887
## 209 0.2404726263
## 210 0.2394730382
## 211 0.2374227727
## 212 0.2337007113
## 213 0.2278785971
## 214 0.2255178628
## 215 0.2174820406
## 216 0.2159192011
## 217 0.2102473439
## 218 0.2094626307
## 219 0.2068974424
## 220 0.2044396709
## 221 0.1995881338
## 222 0.1981666948
## 223 0.1951834518
## 224 0.1911523555
## 225 0.1911516149
## 226 0.1888215185
## 227 0.1798996171
## 228 0.1690364654
## 229 0.1685561829
## 230 0.1608443544
## 231 0.1529116121
## 232 0.1498650620
## 233 0.1497094974
## 234 0.1442980007
## 235 0.1380567582
## 236 0.1369752449
## 237 0.1309525555
## 238 0.1230829815
## 239 0.1196183758
## 240 0.1182549257
## 241 0.1154546922
## 242 0.1113444991
## 243 0.1052551773
## 244 0.1000973127
## 245 0.0944326736
## 246 0.0908392813
## 247 0.0873857932
## 248 0.0863614047
## 249 0.0853802297
## 250 0.0846494477
## 251 0.0788392565
## 252 0.0756001439
## 253 0.0728942019
## 254 0.0706347376
## 255 0.0624371279
## 256 0.0610645500
## 257 0.0603275492
## 258 0.0539276616
## 259 0.0507301074
## 260 0.0492005247
## 261 0.0490112909
## 262 0.0432635601
## 263 0.0361110507
## 264 0.0267341382
## 265 0.0260123548
## 266 0.0196630311
## 267 0.0179235728
## 268 0.0151071529
## 269 0.0149982730
## 270 0.0008666233
## 271 0.0002513851
## 272 0.0000000000
## 273 0.0000000000
## 274 0.0000000000
## 275 0.0000000000
## 276 0.0000000000
## 277 0.0000000000
## 278 0.0000000000
## 279 0.0000000000
## 280 0.0000000000
## 281 0.0000000000
## 282 0.0000000000
## 283 0.0000000000
## 284 0.0000000000
## 285 0.0000000000
## 286 0.0000000000
## 287 0.0000000000
## 288 0.0000000000
## 289 0.0000000000
## 290 0.0000000000
## 291 0.0000000000
## 292 0.0000000000
## 293 0.0000000000
## 294 0.0000000000
## 295 0.0000000000
## 296 0.0000000000
## 297 0.0000000000
## 298 0.0000000000
## 299 0.0000000000
## 300 0.0000000000
## 301 0.0000000000
## 302 0.0000000000
## 303 0.0000000000
## 304 0.0000000000
## 305 0.0000000000
## 306 0.0000000000
## 307 0.0000000000
## 308 0.0000000000
## 309 0.0000000000
## 310 0.0000000000
## 311 0.0000000000
## 312 0.0000000000
## 313 0.0000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CI CN
## 427 221
prop.table(table(df_LRM1$DX))
##
## CI CN
## 0.6589506 0.3410494
table(trainData$DX)
##
## CI CN
## 299 155
prop.table(table(trainData$DX))
##
## CI CN
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.932127
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.929032Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 65.488, df = 1, p-value = 5.848e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 45.674, df = 1, p-value = 1.397e-11library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CI CN
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 314
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
cm_modelTrain_LRM2<-caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM2)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 116 13
## CN 12 53
##
## Accuracy : 0.8711
## 95% CI : (0.8157, 0.9148)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.656e-11
##
## Kappa : 0.7119
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9062
## Specificity : 0.8030
## Pos Pred Value : 0.8992
## Neg Pred Value : 0.8154
## Prevalence : 0.6598
## Detection Rate : 0.5979
## Detection Prevalence : 0.6649
## Balanced Accuracy : 0.8546
##
## 'Positive' Class : CI
##
cm_modelTrain_LRM2_Accuracy<-cm_modelTrain_LRM2$overall["Accuracy"]
cm_modelTrain_LRM2_Kappa<-cm_modelTrain_LRM2$overall["Kappa"]
print(cm_modelTrain_LRM2_Accuracy)
## Accuracy
## 0.871134
print(cm_modelTrain_LRM2_Kappa)
## Kappa
## 0.7118926
print(model_LRM2)
## glmnet
##
## 609 samples
## 313 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 487, 487, 487, 487, 488
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.000214353 0.8735131 0.7461978
## 0.10 0.002143530 0.8767918 0.7528013
## 0.10 0.021435296 0.8784176 0.7560836
## 0.55 0.000214353 0.8702344 0.7396263
## 0.55 0.002143530 0.8685544 0.7362837
## 0.55 0.021435296 0.8340875 0.6673492
## 1.00 0.000214353 0.8521881 0.7033964
## 1.00 0.002143530 0.8521474 0.7034607
## 1.00 0.021435296 0.7749763 0.5489226
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0214353.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
modelTrain_LRM2_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", modelTrain_LRM2_trainAccuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8534345
modelTrain_LRM2_mean_accuracy_model_LRM2 <- mean_accuracy_model_LRM2
print(modelTrain_LRM2_mean_accuracy_model_LRM2)
## [1] 0.8534345
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 313)
##
## Overall
## PC3 100.00
## PC1 45.65
## cg23432430 35.91
## cg09727210 33.49
## PC2 32.09
## cg00962106 30.88
## cg06697310 30.37
## cg07158503 29.15
## cg10701746 27.04
## cg26081710 26.68
## cg02225060 25.91
## cg09015880 25.77
## cg21757617 25.45
## cg00819121 25.27
## cg16338321 24.73
## cg00415024 24.47
## cg07504457 24.31
## cg14168080 23.03
## cg05064044 22.62
## cg16858433 22.51
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3||METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG ==5 || METHOD_FEATURE_FLAG == 6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 4.73724133
## 2 2.16271426
## 3 1.70095331
## 4 1.58633106
## 5 1.52033522
## 6 1.46294601
## 7 1.43875267
## 8 1.38075811
## 9 1.28098848
## 10 1.26407502
## 11 1.22724256
## 12 1.22079741
## 13 1.20580648
## 14 1.19717932
## 15 1.17133288
## 16 1.15897517
## 17 1.15144927
## 18 1.09088660
## 19 1.07172638
## 20 1.06653191
## 21 1.06292852
## 22 1.05102724
## 23 1.05030025
## 24 1.04516096
## 25 1.04455791
## 26 0.95899342
## 27 0.94814248
## 28 0.94608968
## 29 0.88796191
## 30 0.88711946
## 31 0.87863590
## 32 0.87402827
## 33 0.86775178
## 34 0.85935237
## 35 0.85597533
## 36 0.85425398
## 37 0.84607299
## 38 0.84581642
## 39 0.84204289
## 40 0.83185062
## 41 0.82889154
## 42 0.82462524
## 43 0.82329947
## 44 0.81813072
## 45 0.81534099
## 46 0.81252060
## 47 0.79513692
## 48 0.78769600
## 49 0.78484822
## 50 0.78415603
## 51 0.78076190
## 52 0.77411290
## 53 0.76518315
## 54 0.76416818
## 55 0.76276530
## 56 0.75411763
## 57 0.74955508
## 58 0.73445616
## 59 0.73223254
## 60 0.73107655
## 61 0.72416752
## 62 0.72084182
## 63 0.72060032
## 64 0.70543694
## 65 0.70075670
## 66 0.69835082
## 67 0.69484967
## 68 0.69472089
## 69 0.69423967
## 70 0.68684836
## 71 0.68282615
## 72 0.67839090
## 73 0.67215113
## 74 0.67164660
## 75 0.66708092
## 76 0.66452702
## 77 0.65732869
## 78 0.65691099
## 79 0.65686015
## 80 0.65345134
## 81 0.64812490
## 82 0.64529156
## 83 0.62793156
## 84 0.62621688
## 85 0.62604028
## 86 0.62266419
## 87 0.62150591
## 88 0.62128813
## 89 0.61891936
## 90 0.61194453
## 91 0.60878408
## 92 0.60387466
## 93 0.60154656
## 94 0.58359903
## 95 0.57864923
## 96 0.57232234
## 97 0.56833104
## 98 0.56810528
## 99 0.56580366
## 100 0.56524898
## 101 0.56467932
## 102 0.55178796
## 103 0.55173585
## 104 0.54911668
## 105 0.54903540
## 106 0.53895027
## 107 0.53864115
## 108 0.53795234
## 109 0.52702710
## 110 0.52267999
## 111 0.52201886
## 112 0.52143470
## 113 0.51936595
## 114 0.51634992
## 115 0.50805783
## 116 0.50732354
## 117 0.50479499
## 118 0.50177214
## 119 0.49829177
## 120 0.49810680
## 121 0.47808654
## 122 0.47347948
## 123 0.47060002
## 124 0.46863208
## 125 0.46559761
## 126 0.46557579
## 127 0.46317453
## 128 0.45761691
## 129 0.45367635
## 130 0.44703869
## 131 0.43654716
## 132 0.43624963
## 133 0.42974751
## 134 0.42670728
## 135 0.42648637
## 136 0.42612688
## 137 0.42453003
## 138 0.41431291
## 139 0.41371600
## 140 0.41240859
## 141 0.40846532
## 142 0.40713574
## 143 0.40613719
## 144 0.40585034
## 145 0.39942215
## 146 0.39588614
## 147 0.38936467
## 148 0.38511241
## 149 0.38192429
## 150 0.38188537
## 151 0.38034345
## 152 0.37812496
## 153 0.37541929
## 154 0.36892460
## 155 0.36796685
## 156 0.36778911
## 157 0.36539402
## 158 0.36470955
## 159 0.36074853
## 160 0.35954558
## 161 0.35507557
## 162 0.35368515
## 163 0.35319532
## 164 0.34532470
## 165 0.34473980
## 166 0.34219149
## 167 0.34180751
## 168 0.34158630
## 169 0.33964989
## 170 0.33710886
## 171 0.33657979
## 172 0.32655735
## 173 0.32221193
## 174 0.32206817
## 175 0.32127432
## 176 0.31166384
## 177 0.31145825
## 178 0.29637860
## 179 0.29308470
## 180 0.29267086
## 181 0.28495199
## 182 0.28245104
## 183 0.27704325
## 184 0.27198625
## 185 0.27142982
## 186 0.26967234
## 187 0.26940665
## 188 0.26854006
## 189 0.25804035
## 190 0.25573845
## 191 0.25567168
## 192 0.25519600
## 193 0.25515399
## 194 0.25459159
## 195 0.25358671
## 196 0.24805594
## 197 0.24746774
## 198 0.24458430
## 199 0.24435082
## 200 0.24295826
## 201 0.24270701
## 202 0.24236221
## 203 0.23452472
## 204 0.23375090
## 205 0.23354720
## 206 0.23339154
## 207 0.23303581
## 208 0.23294679
## 209 0.23084693
## 210 0.22940684
## 211 0.22367077
## 212 0.22269241
## 213 0.21648890
## 214 0.21429482
## 215 0.20658985
## 216 0.20017340
## 217 0.19887425
## 218 0.19620355
## 219 0.19512709
## 220 0.19455725
## 221 0.18426276
## 222 0.17877626
## 223 0.17768011
## 224 0.17521062
## 225 0.17186014
## 226 0.16105459
## 227 0.16073726
## 228 0.15505456
## 229 0.15447408
## 230 0.14791875
## 231 0.14743801
## 232 0.14633087
## 233 0.14460063
## 234 0.13244384
## 235 0.13210608
## 236 0.13143427
## 237 0.13065130
## 238 0.12943973
## 239 0.12588257
## 240 0.11497969
## 241 0.11168556
## 242 0.10856732
## 243 0.10836953
## 244 0.10223273
## 245 0.09843385
## 246 0.09719439
## 247 0.09365509
## 248 0.08249495
## 249 0.08045198
## 250 0.07985042
## 251 0.07671135
## 252 0.07494819
## 253 0.06795233
## 254 0.06208846
## 255 0.05618149
## 256 0.05180342
## 257 0.05063311
## 258 0.04426320
## 259 0.04356843
## 260 0.04020560
## 261 0.03710177
## 262 0.02989754
## 263 0.02569970
## 264 0.02278718
## 265 0.02229655
## 266 0.01688746
## 267 0.01629155
## 268 0.01408703
## 269 0.00000000
## 270 0.00000000
## 271 0.00000000
## 272 0.00000000
## 273 0.00000000
## 274 0.00000000
## 275 0.00000000
## 276 0.00000000
## 277 0.00000000
## 278 0.00000000
## 279 0.00000000
## 280 0.00000000
## 281 0.00000000
## 282 0.00000000
## 283 0.00000000
## 284 0.00000000
## 285 0.00000000
## 286 0.00000000
## 287 0.00000000
## 288 0.00000000
## 289 0.00000000
## 290 0.00000000
## 291 0.00000000
## 292 0.00000000
## 293 0.00000000
## 294 0.00000000
## 295 0.00000000
## 296 0.00000000
## 297 0.00000000
## 298 0.00000000
## 299 0.00000000
## 300 0.00000000
## 301 0.00000000
## 302 0.00000000
## 303 0.00000000
## 304 0.00000000
## 305 0.00000000
## 306 0.00000000
## 307 0.00000000
## 308 0.00000000
## 309 0.00000000
## 310 0.00000000
## 311 0.00000000
## 312 0.00000000
## 313 0.00000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9097
## [1] "The auc value is:"
## Area under the curve: 0.9097
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_LRM2_AUC <-mean_auc
}
print(modelTrain_LRM2_AUC)
## Area under the curve: 0.9097
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 454 samples
## 313 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.7862515 0.50620659
## 0 0.05357895 0.7950183 0.52074555
## 0 0.10615789 0.8038584 0.53160822
## 0 0.15873684 0.8060806 0.53493479
## 0 0.21131579 0.8016606 0.52036722
## 0 0.26389474 0.7994383 0.51283285
## 0 0.31647368 0.7950427 0.49585158
## 0 0.36905263 0.7884249 0.47734282
## 0 0.42163158 0.7928694 0.48613573
## 0 0.47421053 0.7862271 0.46366075
## 0 0.52678947 0.7884249 0.46671213
## 0 0.57936842 0.7818071 0.44556438
## 0 0.63194737 0.7818315 0.44299981
## 0 0.68452632 0.7752381 0.42142882
## 0 0.73710526 0.7774603 0.42422765
## 0 0.78968421 0.7730647 0.41122351
## 0 0.84226316 0.7708669 0.40431346
## 0 0.89484211 0.7664713 0.39134080
## 0 0.94742105 0.7576801 0.36415459
## 0 1.00000000 0.7444444 0.32221545
## 1 0.00100000 0.7224420 0.37390194
## 1 0.05357895 0.6564103 0.01558753
## 1 0.10615789 0.6585836 0.00000000
## 1 0.15873684 0.6585836 0.00000000
## 1 0.21131579 0.6585836 0.00000000
## 1 0.26389474 0.6585836 0.00000000
## 1 0.31647368 0.6585836 0.00000000
## 1 0.36905263 0.6585836 0.00000000
## 1 0.42163158 0.6585836 0.00000000
## 1 0.47421053 0.6585836 0.00000000
## 1 0.52678947 0.6585836 0.00000000
## 1 0.57936842 0.6585836 0.00000000
## 1 0.63194737 0.6585836 0.00000000
## 1 0.68452632 0.6585836 0.00000000
## 1 0.73710526 0.6585836 0.00000000
## 1 0.78968421 0.6585836 0.00000000
## 1 0.84226316 0.6585836 0.00000000
## 1 0.89484211 0.6585836 0.00000000
## 1 0.94742105 0.6585836 0.00000000
## 1 1.00000000 0.6585836 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.1587368.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.722638
modelTrain_mean_accuracy_cv_ENM1 <- mean_accuracy_elastic_net_model1
print(modelTrain_mean_accuracy_cv_ENM1)
## [1] 0.722638
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
modelTrain_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.973568281938326"
print(modelTrain_ENM1_trainAccuracy)
## [1] 0.9735683
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_modelTrain_ENM1<- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_modelTrain_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 126 18
## CN 2 48
##
## Accuracy : 0.8969
## 95% CI : (0.8453, 0.9359)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.772e-14
##
## Kappa : 0.756
##
## Mcnemar's Test P-Value : 0.0007962
##
## Sensitivity : 0.9844
## Specificity : 0.7273
## Pos Pred Value : 0.8750
## Neg Pred Value : 0.9600
## Prevalence : 0.6598
## Detection Rate : 0.6495
## Detection Prevalence : 0.7423
## Balanced Accuracy : 0.8558
##
## 'Positive' Class : CI
##
cm_modelTrain_ENM1_Accuracy <- cm_modelTrain_ENM1$overall["Accuracy"]
print(cm_modelTrain_ENM1_Accuracy)
## Accuracy
## 0.8969072
cm_modelTrain_ENM1_Kappa <- cm_modelTrain_ENM1$overall["Kappa"]
print(cm_modelTrain_ENM1_Kappa)
## Kappa
## 0.7560362
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 313)
##
## Overall
## PC3 100.00
## PC2 84.53
## PC1 75.99
## cg23432430 62.27
## cg00962106 52.02
## cg07158503 51.14
## cg06697310 49.97
## cg09727210 48.46
## cg02225060 47.85
## cg06277607 42.73
## cg16338321 42.67
## cg26081710 40.29
## cg21757617 39.97
## cg27272246 38.53
## cg09015880 38.02
## cg00819121 37.58
## cg02887598 37.38
## cg05064044 37.05
## cg00004073 36.87
## cg17429539 36.87
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 ||METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 1.175115079
## 2 0.993654080
## 3 0.893532902
## 4 0.732676077
## 5 0.612474354
## 6 0.602081072
## 7 0.588338283
## 8 0.570704740
## 9 0.563482510
## 10 0.503541109
## 11 0.502798118
## 12 0.474902373
## 13 0.471083184
## 14 0.454211128
## 15 0.448210568
## 16 0.443055674
## 17 0.440722143
## 18 0.436918720
## 19 0.434818554
## 20 0.434715737
## 21 0.434680710
## 22 0.433972215
## 23 0.430411586
## 24 0.427929924
## 25 0.422215493
## 26 0.422198375
## 27 0.419095951
## 28 0.416391897
## 29 0.410552311
## 30 0.409248352
## 31 0.406698264
## 32 0.404806291
## 33 0.402903699
## 34 0.400805246
## 35 0.394486618
## 36 0.390668734
## 37 0.389900967
## 38 0.389340890
## 39 0.387621502
## 40 0.385004177
## 41 0.380935907
## 42 0.379773537
## 43 0.377186104
## 44 0.376027600
## 45 0.375109969
## 46 0.373930293
## 47 0.372686038
## 48 0.372235546
## 49 0.370924728
## 50 0.370576575
## 51 0.367976075
## 52 0.366223885
## 53 0.364300270
## 54 0.360907884
## 55 0.360046674
## 56 0.359431713
## 57 0.358367717
## 58 0.354059041
## 59 0.349875077
## 60 0.348220588
## 61 0.347558794
## 62 0.346970910
## 63 0.343088621
## 64 0.342627997
## 65 0.342170666
## 66 0.339101895
## 67 0.338357484
## 68 0.337835302
## 69 0.336324945
## 70 0.335963341
## 71 0.333872881
## 72 0.331817569
## 73 0.331371663
## 74 0.330942610
## 75 0.328639393
## 76 0.323771410
## 77 0.322292263
## 78 0.319676434
## 79 0.319544642
## 80 0.318308210
## 81 0.317808720
## 82 0.316732131
## 83 0.315560112
## 84 0.314822802
## 85 0.313529934
## 86 0.313502513
## 87 0.313123191
## 88 0.312863194
## 89 0.310627304
## 90 0.308590929
## 91 0.307962148
## 92 0.305055661
## 93 0.304979611
## 94 0.295482246
## 95 0.294603886
## 96 0.293745021
## 97 0.292779535
## 98 0.292182519
## 99 0.290525042
## 100 0.290328174
## 101 0.289918614
## 102 0.289438703
## 103 0.287572777
## 104 0.286764062
## 105 0.286579341
## 106 0.284952314
## 107 0.284593118
## 108 0.283601627
## 109 0.281920938
## 110 0.280832312
## 111 0.280811790
## 112 0.279237741
## 113 0.277767873
## 114 0.275541019
## 115 0.275374541
## 116 0.274396251
## 117 0.272194143
## 118 0.272080417
## 119 0.267135990
## 120 0.266712998
## 121 0.264778461
## 122 0.263257295
## 123 0.260048087
## 124 0.257631351
## 125 0.257106538
## 126 0.256870433
## 127 0.256252867
## 128 0.254441406
## 129 0.252879288
## 130 0.249836435
## 131 0.249457861
## 132 0.248673243
## 133 0.246920121
## 134 0.246537427
## 135 0.245847303
## 136 0.245730549
## 137 0.244117616
## 138 0.242509172
## 139 0.241886236
## 140 0.241358458
## 141 0.237364368
## 142 0.235978103
## 143 0.233925841
## 144 0.233586316
## 145 0.231610922
## 146 0.230528569
## 147 0.228792860
## 148 0.228302971
## 149 0.227471071
## 150 0.225862006
## 151 0.224066021
## 152 0.223866878
## 153 0.223218772
## 154 0.223133707
## 155 0.221793118
## 156 0.221349475
## 157 0.220952803
## 158 0.220182381
## 159 0.219184432
## 160 0.218269681
## 161 0.215903421
## 162 0.213164979
## 163 0.212470769
## 164 0.211823176
## 165 0.210241885
## 166 0.209718601
## 167 0.209499987
## 168 0.209444836
## 169 0.207098152
## 170 0.206187830
## 171 0.204908065
## 172 0.204676885
## 173 0.204514407
## 174 0.204180849
## 175 0.203066027
## 176 0.202829795
## 177 0.202823903
## 178 0.201434203
## 179 0.201361120
## 180 0.200291628
## 181 0.199515674
## 182 0.198575683
## 183 0.198554091
## 184 0.195497345
## 185 0.195345198
## 186 0.194356872
## 187 0.193133432
## 188 0.191480434
## 189 0.189610131
## 190 0.189478770
## 191 0.188702000
## 192 0.188663065
## 193 0.187063414
## 194 0.184895132
## 195 0.184647013
## 196 0.184356035
## 197 0.182974794
## 198 0.181193135
## 199 0.179552584
## 200 0.179143297
## 201 0.177891278
## 202 0.175960746
## 203 0.175266287
## 204 0.174684681
## 205 0.174609454
## 206 0.173276914
## 207 0.172729225
## 208 0.167349512
## 209 0.165650393
## 210 0.163894082
## 211 0.163135971
## 212 0.162721369
## 213 0.159540253
## 214 0.158955036
## 215 0.157729754
## 216 0.157503697
## 217 0.157034142
## 218 0.156901777
## 219 0.156695500
## 220 0.156357021
## 221 0.151742081
## 222 0.151421928
## 223 0.151087433
## 224 0.150404632
## 225 0.149817571
## 226 0.149598423
## 227 0.145446990
## 228 0.144231745
## 229 0.143096593
## 230 0.142880919
## 231 0.141833674
## 232 0.141317619
## 233 0.140169056
## 234 0.139551793
## 235 0.138351158
## 236 0.137248579
## 237 0.136001213
## 238 0.134300553
## 239 0.134299757
## 240 0.134052000
## 241 0.133116228
## 242 0.131767425
## 243 0.131064823
## 244 0.130556693
## 245 0.130154319
## 246 0.128736660
## 247 0.127691082
## 248 0.124575154
## 249 0.124533911
## 250 0.124189821
## 251 0.121457115
## 252 0.121271345
## 253 0.120158593
## 254 0.119486520
## 255 0.115466054
## 256 0.112678325
## 257 0.112454110
## 258 0.111654603
## 259 0.110686563
## 260 0.106876769
## 261 0.104774506
## 262 0.104372371
## 263 0.104297384
## 264 0.102104015
## 265 0.101892318
## 266 0.100105442
## 267 0.099500057
## 268 0.098368171
## 269 0.096062061
## 270 0.094860869
## 271 0.091132640
## 272 0.090185789
## 273 0.089119388
## 274 0.088131999
## 275 0.087623209
## 276 0.084493797
## 277 0.084458536
## 278 0.082490173
## 279 0.082237386
## 280 0.082228018
## 281 0.080509490
## 282 0.073850360
## 283 0.070585757
## 284 0.068368220
## 285 0.065589030
## 286 0.065122020
## 287 0.063538752
## 288 0.063498784
## 289 0.063469176
## 290 0.062957120
## 291 0.061269421
## 292 0.059626725
## 293 0.058543870
## 294 0.058208552
## 295 0.054151316
## 296 0.052344472
## 297 0.052168684
## 298 0.047801344
## 299 0.043828013
## 300 0.041454820
## 301 0.036520783
## 302 0.036015991
## 303 0.027482469
## 304 0.025601429
## 305 0.021803023
## 306 0.020724048
## 307 0.015792103
## 308 0.014780493
## 309 0.011997938
## 310 0.006688169
## 311 0.006359318
## 312 0.005485135
## 313 0.002381055
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG ==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.9458
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_ENM1_AUC <-mean_auc
}
print(modelTrain_ENM1_AUC)
## Area under the curve: 0.9458
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
# Start point of parallel processing
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 454 samples
## 313 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6012210 0.018240514
## 0.3 1 0.6 0.50 100 0.6055678 0.058191426
## 0.3 1 0.6 0.50 150 0.6320879 0.142055824
## 0.3 1 0.6 0.75 50 0.6409524 0.078806663
## 0.3 1 0.6 0.75 100 0.6410012 0.118200965
## 0.3 1 0.6 0.75 150 0.6541880 0.137896928
## 0.3 1 0.6 1.00 50 0.5880342 -0.070374087
## 0.3 1 0.6 1.00 100 0.6035653 0.006021368
## 0.3 1 0.6 1.00 150 0.6144567 0.039738408
## 0.3 1 0.8 0.50 50 0.6124542 0.066273466
## 0.3 1 0.8 0.50 100 0.6586325 0.181792922
## 0.3 1 0.8 0.50 150 0.6828083 0.242903429
## 0.3 1 0.8 0.75 50 0.6012698 -0.001435643
## 0.3 1 0.8 0.75 100 0.6276679 0.078829544
## 0.3 1 0.8 0.75 150 0.6232479 0.084532971
## 0.3 1 0.8 1.00 50 0.6079365 -0.029532168
## 0.3 1 0.8 1.00 100 0.6189988 0.032152110
## 0.3 1 0.8 1.00 150 0.6079853 0.032642931
## 0.3 2 0.6 0.50 50 0.6475946 0.131147712
## 0.3 2 0.6 0.50 100 0.6432479 0.102854477
## 0.3 2 0.6 0.50 150 0.6586569 0.143202170
## 0.3 2 0.6 0.75 50 0.6166789 0.052627786
## 0.3 2 0.6 0.75 100 0.6299145 0.070648048
## 0.3 2 0.6 0.75 150 0.6100366 0.034797450
## 0.3 2 0.6 1.00 50 0.6056410 0.004372945
## 0.3 2 0.6 1.00 100 0.6144078 0.040099634
## 0.3 2 0.6 1.00 150 0.6342857 0.070997992
## 0.3 2 0.8 0.50 50 0.6322100 0.101269091
## 0.3 2 0.8 0.50 100 0.6674237 0.211918172
## 0.3 2 0.8 0.50 150 0.6718926 0.217145929
## 0.3 2 0.8 0.75 50 0.6079121 0.017187490
## 0.3 2 0.8 0.75 100 0.6124054 0.016675210
## 0.3 2 0.8 0.75 150 0.6168498 0.026072802
## 0.3 2 0.8 1.00 50 0.6123321 0.013856988
## 0.3 2 0.8 1.00 100 0.6321123 0.068676845
## 0.3 2 0.8 1.00 150 0.6475458 0.109937123
## 0.3 3 0.6 0.50 50 0.6786569 0.205671560
## 0.3 3 0.6 0.50 100 0.6829792 0.222394030
## 0.3 3 0.6 0.50 150 0.7072283 0.271933929
## 0.3 3 0.6 0.75 50 0.6475702 0.123189329
## 0.3 3 0.6 0.75 100 0.6497436 0.127721397
## 0.3 3 0.6 0.75 150 0.6607326 0.159664842
## 0.3 3 0.6 1.00 50 0.6321856 0.043792614
## 0.3 3 0.6 1.00 100 0.6476190 0.085904571
## 0.3 3 0.6 1.00 150 0.6431990 0.081282761
## 0.3 3 0.8 0.50 50 0.6828327 0.215983183
## 0.3 3 0.8 0.50 100 0.6871795 0.222096767
## 0.3 3 0.8 0.50 150 0.6981685 0.255666694
## 0.3 3 0.8 0.75 50 0.6389011 0.087659333
## 0.3 3 0.8 0.75 100 0.6564835 0.137873309
## 0.3 3 0.8 0.75 150 0.6674725 0.169998819
## 0.3 3 0.8 1.00 50 0.6520147 0.093178940
## 0.3 3 0.8 1.00 100 0.6674725 0.139474684
## 0.3 3 0.8 1.00 150 0.6652503 0.131307036
## 0.4 1 0.6 0.50 50 0.6167277 0.082109043
## 0.4 1 0.6 0.50 100 0.6409280 0.146697270
## 0.4 1 0.6 0.50 150 0.6519170 0.171609925
## 0.4 1 0.6 0.75 50 0.6012698 0.033591049
## 0.4 1 0.6 0.75 100 0.6365812 0.124706091
## 0.4 1 0.6 0.75 150 0.6365812 0.140354883
## 0.4 1 0.6 1.00 50 0.5947009 -0.036670975
## 0.4 1 0.6 1.00 100 0.6100611 0.019710601
## 0.4 1 0.6 1.00 150 0.6078632 0.028453638
## 0.4 1 0.8 0.50 50 0.6497436 0.143903004
## 0.4 1 0.8 0.50 100 0.6628816 0.183617609
## 0.4 1 0.8 0.50 150 0.6540904 0.181250729
## 0.4 1 0.8 0.75 50 0.5858852 0.002546459
## 0.4 1 0.8 0.75 100 0.6410256 0.133937136
## 0.4 1 0.8 0.75 150 0.6607570 0.181258526
## 0.4 1 0.8 1.00 50 0.5902076 -0.032106357
## 0.4 1 0.8 1.00 100 0.6100611 0.023006259
## 0.4 1 0.8 1.00 150 0.6122589 0.047670623
## 0.4 2 0.6 0.50 50 0.6608059 0.195170043
## 0.4 2 0.6 0.50 100 0.6652259 0.185859827
## 0.4 2 0.6 0.50 150 0.6740904 0.206642880
## 0.4 2 0.6 0.75 50 0.6055922 0.032593241
## 0.4 2 0.6 0.75 100 0.6144811 0.064984931
## 0.4 2 0.6 0.75 150 0.6210745 0.076291889
## 0.4 2 0.6 1.00 50 0.6408791 0.082904404
## 0.4 2 0.6 1.00 100 0.6299878 0.058501094
## 0.4 2 0.6 1.00 150 0.6322100 0.072710378
## 0.4 2 0.8 0.50 50 0.6432479 0.151942575
## 0.4 2 0.8 0.50 100 0.6697436 0.189291268
## 0.4 2 0.8 0.50 150 0.6741636 0.211667595
## 0.4 2 0.8 0.75 50 0.6432479 0.123133613
## 0.4 2 0.8 0.75 100 0.6586569 0.162919635
## 0.4 2 0.8 0.75 150 0.6674481 0.184550029
## 0.4 2 0.8 1.00 50 0.6431746 0.096013783
## 0.4 2 0.8 1.00 100 0.6542125 0.108411031
## 0.4 2 0.8 1.00 150 0.6366056 0.069966742
## 0.4 3 0.6 0.50 50 0.6276679 0.082881544
## 0.4 3 0.6 0.50 100 0.6519658 0.142956134
## 0.4 3 0.6 0.50 150 0.6629792 0.176451936
## 0.4 3 0.6 0.75 50 0.6365812 0.090716228
## 0.4 3 0.6 0.75 100 0.6277167 0.069989364
## 0.4 3 0.6 0.75 150 0.6365568 0.088516395
## 0.4 3 0.6 1.00 50 0.6431746 0.087524193
## 0.4 3 0.6 1.00 100 0.6300122 0.046638358
## 0.4 3 0.6 1.00 150 0.6322344 0.058269839
## 0.4 3 0.8 0.50 50 0.6079853 0.039012378
## 0.4 3 0.8 0.50 100 0.6210989 0.070892718
## 0.4 3 0.8 0.50 150 0.6320879 0.102593312
## 0.4 3 0.8 0.75 50 0.6543346 0.115610534
## 0.4 3 0.8 0.75 100 0.6631746 0.140095927
## 0.4 3 0.8 0.75 150 0.6609280 0.130920788
## 0.4 3 0.8 1.00 50 0.6298901 0.063740487
## 0.4 3 0.8 1.00 100 0.6277656 0.041438805
## 0.4 3 0.8 1.00 150 0.6277411 0.048692065
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6381592
modelTrain_mean_accuracy_cv_xgb <- mean_accuracy_xgb_model
print(modelTrain_mean_accuracy_cv_xgb)
## [1] 0.6381592
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
modelTrain_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", modelTrain_xgb_trainAccuracy))
## [1] "Training Accuracy: 1"
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_modelTrain_xgb <- caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_modelTrain_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 120 40
## CN 8 26
##
## Accuracy : 0.7526
## 95% CI : (0.6857, 0.8116)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.003346
##
## Kappa : 0.3755
##
## Mcnemar's Test P-Value : 7.66e-06
##
## Sensitivity : 0.9375
## Specificity : 0.3939
## Pos Pred Value : 0.7500
## Neg Pred Value : 0.7647
## Prevalence : 0.6598
## Detection Rate : 0.6186
## Detection Prevalence : 0.8247
## Balanced Accuracy : 0.6657
##
## 'Positive' Class : CI
##
cm_modelTrain_xgb_Accuracy <- cm_modelTrain_xgb$overall["Accuracy"]
cm_modelTrain_xgb_Kappa <- cm_modelTrain_xgb$overall["Kappa"]
print(cm_modelTrain_xgb_Accuracy)
## Accuracy
## 0.7525773
print(cm_modelTrain_xgb_Kappa)
## Kappa
## 0.3755365
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 313)
##
## Overall
## cg23432430 100.00
## age.now 63.50
## cg11438323 61.99
## cg11540596 57.57
## cg03660162 52.02
## cg17002719 45.38
## cg09120722 43.33
## cg17002338 43.12
## cg07158503 42.32
## cg14168080 41.50
## cg11227702 41.48
## cg07634717 41.15
## cg18816397 40.96
## cg02122327 40.53
## cg19799454 40.13
## cg13573375 39.19
## cg03088219 38.67
## PC2 38.04
## cg20678988 37.49
## cg11019791 37.10
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: cg23432430 3.027514e-02 0.0257936980 0.012216405 3.027514e-02
## 2: age.now 1.922394e-02 0.0178941206 0.012216405 1.922394e-02
## 3: cg11438323 1.876836e-02 0.0148479474 0.005235602 1.876836e-02
## 4: cg11540596 1.742898e-02 0.0129093067 0.013961606 1.742898e-02
## 5: cg03660162 1.574834e-02 0.0084935366 0.008726003 1.574834e-02
## ---
## 248: cg25649515 8.420077e-05 0.0007715400 0.001745201 8.420077e-05
## 249: cg07028768 8.255277e-05 0.0005230628 0.001745201 8.255277e-05
## 250: cg27224751 6.555581e-05 0.0004160068 0.001745201 6.555581e-05
## 251: cg18819889 6.258985e-05 0.0004043872 0.001745201 6.258985e-05
## 252: cg12776173 2.085415e-05 0.0004195156 0.001745201 2.085415e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7603
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_xgb_AUC<-mean_auc
}
print(modelTrain_xgb_AUC)
## Area under the curve: 0.7603
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 454 samples
## 313 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6608059 0.008374734
## 157 0.6674481 0.045143239
## 313 0.6630281 0.039527981
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 157.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
modelTrain_mean_accuracy_cv_rf <- mean_accuracy_rf_model
print(modelTrain_mean_accuracy_cv_rf)
## [1] 0.6637607
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
modelTrain_rf_trainAccuracy <- train_accuracy
print(modelTrain_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_modelTrain_rf <- caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_modelTrain_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 127 63
## CN 1 3
##
## Accuracy : 0.6701
## 95% CI : (0.5991, 0.7358)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.4131
##
## Kappa : 0.0487
##
## Mcnemar's Test P-Value : 2.44e-14
##
## Sensitivity : 0.99219
## Specificity : 0.04545
## Pos Pred Value : 0.66842
## Neg Pred Value : 0.75000
## Prevalence : 0.65979
## Detection Rate : 0.65464
## Detection Prevalence : 0.97938
## Balanced Accuracy : 0.51882
##
## 'Positive' Class : CI
##
cm_modelTrain_rf_Accuracy <- cm_modelTrain_rf$overall["Accuracy"]
cm_modelTrain_rf_Kappa <- cm_modelTrain_rf$overall["Kappa"]
print(cm_modelTrain_rf_Accuracy)
## Accuracy
## 0.6701031
print(cm_modelTrain_rf_Kappa)
## Kappa
## 0.04872816
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 313)
##
## Importance
## cg23432430 100.00
## cg11019791 74.47
## cg03749159 72.88
## cg11331837 70.13
## cg21697769 67.79
## cg01008088 67.13
## cg04768387 66.18
## cg16431720 63.43
## cg00415024 62.26
## cg12784167 62.17
## cg23159970 61.95
## cg24851651 59.58
## cg17042243 59.56
## cg09451339 57.62
## PC3 55.39
## cg17386240 54.93
## PC2 54.69
## cg04109990 54.52
## cg06697310 54.35
## cg14192979 54.33
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
## CI CN
## 1 4.533646382 4.533646382
## 2 2.826616756 2.826616756
## 3 2.720567988 2.720567988
## 4 2.536590584 2.536590584
## 5 2.380574060 2.380574060
## 6 2.335993233 2.335993233
## 7 2.272574779 2.272574779
## 8 2.088853264 2.088853264
## 9 2.010312547 2.010312547
## 10 2.004291812 2.004291812
## 11 1.989529692 1.989529692
## 12 1.831215081 1.831215081
## 13 1.830294344 1.830294344
## 14 1.700345913 1.700345913
## 15 1.551390508 1.551390508
## 16 1.520257908 1.520257908
## 17 1.504252329 1.504252329
## 18 1.493185873 1.493185873
## 19 1.481670227 1.481670227
## 20 1.480408410 1.480408410
## 21 1.446962190 1.446962190
## 22 1.436662424 1.436662424
## 23 1.419819595 1.419819595
## 24 1.418115344 1.418115344
## 25 1.415596947 1.415596947
## 26 1.406438429 1.406438429
## 27 1.406239440 1.406239440
## 28 1.384580003 1.384580003
## 29 1.379197755 1.379197755
## 30 1.354287084 1.354287084
## 31 1.347072236 1.347072236
## 32 1.330866999 1.330866999
## 33 1.305905015 1.305905015
## 34 1.303181565 1.303181565
## 35 1.235686518 1.235686518
## 36 1.227521734 1.227521734
## 37 1.203469750 1.203469750
## 38 1.199763791 1.199763791
## 39 1.195626994 1.195626994
## 40 1.183024512 1.183024512
## 41 1.175732698 1.175732698
## 42 1.174570250 1.174570250
## 43 1.162881540 1.162881540
## 44 1.158727441 1.158727441
## 45 1.144242298 1.144242298
## 46 1.134040359 1.134040359
## 47 1.097663038 1.097663038
## 48 1.092880677 1.092880677
## 49 1.071866003 1.071866003
## 50 1.060868329 1.060868329
## 51 1.046664195 1.046664195
## 52 1.014913250 1.014913250
## 53 1.006734858 1.006734858
## 54 0.988089664 0.988089664
## 55 0.980175685 0.980175685
## 56 0.932884451 0.932884451
## 57 0.918548775 0.918548775
## 58 0.906548960 0.906548960
## 59 0.898209088 0.898209088
## 60 0.895690672 0.895690672
## 61 0.891856479 0.891856479
## 62 0.882704329 0.882704329
## 63 0.866465342 0.866465342
## 64 0.852447727 0.852447727
## 65 0.847507411 0.847507411
## 66 0.828755893 0.828755893
## 67 0.827252900 0.827252900
## 68 0.810969997 0.810969997
## 69 0.798024544 0.798024544
## 70 0.769381132 0.769381132
## 71 0.749023937 0.749023937
## 72 0.748578813 0.748578813
## 73 0.742609264 0.742609264
## 74 0.732379664 0.732379664
## 75 0.705973598 0.705973598
## 76 0.692837281 0.692837281
## 77 0.688560925 0.688560925
## 78 0.682577323 0.682577323
## 79 0.671669871 0.671669871
## 80 0.665601363 0.665601363
## 81 0.656754855 0.656754855
## 82 0.653073911 0.653073911
## 83 0.631601305 0.631601305
## 84 0.630051329 0.630051329
## 85 0.626039694 0.626039694
## 86 0.623817117 0.623817117
## 87 0.611441377 0.611441377
## 88 0.607582326 0.607582326
## 89 0.594048636 0.594048636
## 90 0.577933117 0.577933117
## 91 0.574326439 0.574326439
## 92 0.559595685 0.559595685
## 93 0.556572809 0.556572809
## 94 0.556147457 0.556147457
## 95 0.536805063 0.536805063
## 96 0.504705315 0.504705315
## 97 0.490069442 0.490069442
## 98 0.481136527 0.481136527
## 99 0.476896638 0.476896638
## 100 0.465291423 0.465291423
## 101 0.463892261 0.463892261
## 102 0.452323656 0.452323656
## 103 0.442123224 0.442123224
## 104 0.439028567 0.439028567
## 105 0.435511756 0.435511756
## 106 0.432195455 0.432195455
## 107 0.431651497 0.431651497
## 108 0.421444133 0.421444133
## 109 0.417795807 0.417795807
## 110 0.411368417 0.411368417
## 111 0.410144901 0.410144901
## 112 0.403400332 0.403400332
## 113 0.399414862 0.399414862
## 114 0.397659994 0.397659994
## 115 0.395029475 0.395029475
## 116 0.394376081 0.394376081
## 117 0.388612951 0.388612951
## 118 0.372748627 0.372748627
## 119 0.372673830 0.372673830
## 120 0.372109988 0.372109988
## 121 0.371503201 0.371503201
## 122 0.336151367 0.336151367
## 123 0.335277278 0.335277278
## 124 0.327339574 0.327339574
## 125 0.321047893 0.321047893
## 126 0.308623412 0.308623412
## 127 0.306688890 0.306688890
## 128 0.295877169 0.295877169
## 129 0.270232468 0.270232468
## 130 0.262897363 0.262897363
## 131 0.262601998 0.262601998
## 132 0.250614962 0.250614962
## 133 0.241536502 0.241536502
## 134 0.225148032 0.225148032
## 135 0.211258479 0.211258479
## 136 0.206389866 0.206389866
## 137 0.204852274 0.204852274
## 138 0.202647229 0.202647229
## 139 0.201403188 0.201403188
## 140 0.174170570 0.174170570
## 141 0.158201124 0.158201124
## 142 0.142416850 0.142416850
## 143 0.137118539 0.137118539
## 144 0.136623080 0.136623080
## 145 0.134657947 0.134657947
## 146 0.133488102 0.133488102
## 147 0.129153901 0.129153901
## 148 0.124195344 0.124195344
## 149 0.112348706 0.112348706
## 150 0.099693622 0.099693622
## 151 0.083949277 0.083949277
## 152 0.083292061 0.083292061
## 153 0.077731225 0.077731225
## 154 0.070108091 0.070108091
## 155 0.061671663 0.061671663
## 156 0.059739976 0.059739976
## 157 0.058419380 0.058419380
## 158 0.058104487 0.058104487
## 159 0.057315818 0.057315818
## 160 0.056149654 0.056149654
## 161 0.051495304 0.051495304
## 162 0.045485472 0.045485472
## 163 0.036623888 0.036623888
## 164 0.034518767 0.034518767
## 165 0.028672922 0.028672922
## 166 0.020221873 0.020221873
## 167 0.010445791 0.010445791
## 168 -0.000578254 -0.000578254
## 169 -0.001896650 -0.001896650
## 170 -0.020139876 -0.020139876
## 171 -0.020273501 -0.020273501
## 172 -0.025361192 -0.025361192
## 173 -0.037003575 -0.037003575
## 174 -0.046634085 -0.046634085
## 175 -0.066611546 -0.066611546
## 176 -0.073104989 -0.073104989
## 177 -0.079129409 -0.079129409
## 178 -0.081048211 -0.081048211
## 179 -0.081751132 -0.081751132
## 180 -0.085303258 -0.085303258
## 181 -0.094681467 -0.094681467
## 182 -0.096709048 -0.096709048
## 183 -0.100293586 -0.100293586
## 184 -0.108943302 -0.108943302
## 185 -0.114558840 -0.114558840
## 186 -0.125599953 -0.125599953
## 187 -0.136596143 -0.136596143
## 188 -0.143532960 -0.143532960
## 189 -0.151773324 -0.151773324
## 190 -0.156816573 -0.156816573
## 191 -0.169871042 -0.169871042
## 192 -0.175826589 -0.175826589
## 193 -0.177335726 -0.177335726
## 194 -0.193863154 -0.193863154
## 195 -0.194616547 -0.194616547
## 196 -0.194738407 -0.194738407
## 197 -0.195729745 -0.195729745
## 198 -0.198309465 -0.198309465
## 199 -0.198342963 -0.198342963
## 200 -0.202530122 -0.202530122
## 201 -0.226232415 -0.226232415
## 202 -0.227916651 -0.227916651
## 203 -0.241102317 -0.241102317
## 204 -0.244616253 -0.244616253
## 205 -0.259951584 -0.259951584
## 206 -0.272876028 -0.272876028
## 207 -0.278187191 -0.278187191
## 208 -0.280828739 -0.280828739
## 209 -0.281579714 -0.281579714
## 210 -0.286762155 -0.286762155
## 211 -0.293310947 -0.293310947
## 212 -0.299157985 -0.299157985
## 213 -0.301805804 -0.301805804
## 214 -0.308247530 -0.308247530
## 215 -0.319557728 -0.319557728
## 216 -0.322988141 -0.322988141
## 217 -0.326681800 -0.326681800
## 218 -0.332714522 -0.332714522
## 219 -0.338952224 -0.338952224
## 220 -0.339971598 -0.339971598
## 221 -0.352679019 -0.352679019
## 222 -0.353230342 -0.353230342
## 223 -0.369029760 -0.369029760
## 224 -0.371470412 -0.371470412
## 225 -0.373961330 -0.373961330
## 226 -0.382465257 -0.382465257
## 227 -0.386917627 -0.386917627
## 228 -0.399341350 -0.399341350
## 229 -0.402302463 -0.402302463
## 230 -0.408241965 -0.408241965
## 231 -0.409813988 -0.409813988
## 232 -0.414648762 -0.414648762
## 233 -0.424943962 -0.424943962
## 234 -0.433299324 -0.433299324
## 235 -0.453012221 -0.453012221
## 236 -0.460004138 -0.460004138
## 237 -0.463170582 -0.463170582
## 238 -0.481935034 -0.481935034
## 239 -0.493647685 -0.493647685
## 240 -0.516928175 -0.516928175
## 241 -0.526149835 -0.526149835
## 242 -0.532202186 -0.532202186
## 243 -0.535366964 -0.535366964
## 244 -0.558007364 -0.558007364
## 245 -0.562345376 -0.562345376
## 246 -0.562605701 -0.562605701
## 247 -0.574792306 -0.574792306
## 248 -0.581042054 -0.581042054
## 249 -0.583570986 -0.583570986
## 250 -0.586475888 -0.586475888
## 251 -0.590670069 -0.590670069
## 252 -0.592489679 -0.592489679
## 253 -0.594756659 -0.594756659
## 254 -0.605822004 -0.605822004
## 255 -0.610570811 -0.610570811
## 256 -0.627135299 -0.627135299
## 257 -0.640749006 -0.640749006
## 258 -0.651822912 -0.651822912
## 259 -0.660359145 -0.660359145
## 260 -0.678556354 -0.678556354
## 261 -0.693322771 -0.693322771
## 262 -0.702283567 -0.702283567
## 263 -0.764283534 -0.764283534
## 264 -0.783938282 -0.783938282
## 265 -0.785053492 -0.785053492
## 266 -0.794818056 -0.794818056
## 267 -0.822035882 -0.822035882
## 268 -0.822907697 -0.822907697
## 269 -0.833669030 -0.833669030
## 270 -0.852913709 -0.852913709
## 271 -0.865147989 -0.865147989
## 272 -0.867271942 -0.867271942
## 273 -0.880726097 -0.880726097
## 274 -0.906037727 -0.906037727
## 275 -0.911810922 -0.911810922
## 276 -0.913382748 -0.913382748
## 277 -0.920456895 -0.920456895
## 278 -0.932132012 -0.932132012
## 279 -0.943349589 -0.943349589
## 280 -0.947349785 -0.947349785
## 281 -0.999787513 -0.999787513
## 282 -1.019550666 -1.019550666
## 283 -1.030915523 -1.030915523
## 284 -1.043273548 -1.043273548
## 285 -1.045907732 -1.045907732
## 286 -1.069214248 -1.069214248
## 287 -1.086773569 -1.086773569
## 288 -1.089861228 -1.089861228
## 289 -1.090008421 -1.090008421
## 290 -1.094748841 -1.094748841
## 291 -1.126643500 -1.126643500
## 292 -1.139508688 -1.139508688
## 293 -1.164856016 -1.164856016
## 294 -1.190780496 -1.190780496
## 295 -1.198059681 -1.198059681
## 296 -1.225211590 -1.225211590
## 297 -1.243214639 -1.243214639
## 298 -1.277725269 -1.277725269
## 299 -1.278931210 -1.278931210
## 300 -1.341878084 -1.341878084
## 301 -1.351042340 -1.351042340
## 302 -1.354423486 -1.354423486
## 303 -1.374326828 -1.374326828
## 304 -1.395629053 -1.395629053
## 305 -1.449515709 -1.449515709
## 306 -1.508409723 -1.508409723
## 307 -1.542663329 -1.542663329
## 308 -1.621817420 -1.621817420
## 309 -1.640123195 -1.640123195
## 310 -1.655695609 -1.655695609
## 311 -1.973088543 -1.973088543
## 312 -2.000212934 -2.000212934
## 313 -2.151840452 -2.151840452
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.69
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_rf_AUC <- mean_auc
}
print(modelTrain_rf_AUC)
## Area under the curve: 0.69
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 454 samples
## 313 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 364, 363
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8281563 0.6291507
## 0.50 0.8281319 0.6251914
## 1.00 0.8325763 0.6330576
##
## Tuning parameter 'sigma' was held constant at a value of 0.001632383
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.001632383 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.001632383 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8296215
modelTrain_mean_accuracy_cv_svm <- mean_accuracy_svm_model
print(modelTrain_mean_accuracy_cv_svm)
## [1] 0.8296215
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.991189427312775"
modelTrain_svm_trainAccuracy <-train_accuracy
print(modelTrain_svm_trainAccuracy)
## [1] 0.9911894
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_modelTrain_svm <- caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_modelTrain_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 113 11
## CN 15 55
##
## Accuracy : 0.866
## 95% CI : (0.8098, 0.9105)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 5.65e-11
##
## Kappa : 0.7058
##
## Mcnemar's Test P-Value : 0.5563
##
## Sensitivity : 0.8828
## Specificity : 0.8333
## Pos Pred Value : 0.9113
## Neg Pred Value : 0.7857
## Prevalence : 0.6598
## Detection Rate : 0.5825
## Detection Prevalence : 0.6392
## Balanced Accuracy : 0.8581
##
## 'Positive' Class : CI
##
cm_modelTrain_svm_Accuracy <- cm_modelTrain_svm$overall["Accuracy"]
cm_modelTrain_svm_Kappa <- cm_modelTrain_svm$overall["Kappa"]
print(cm_modelTrain_svm_Accuracy)
## Accuracy
## 0.8659794
print(cm_modelTrain_svm_Kappa)
## Kappa
## 0.7057863
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 314 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg23432430 1.140000 1.166667 1.193333 0.05401235
## 2 cg12333628 1.080000 1.133333 1.133333 0.05246914
## 3 cg00962106 1.053333 1.133333 1.186667 0.05246914
## 4 cg03600007 1.040000 1.100000 1.100000 0.05092593
## 5 cg19799454 1.040000 1.100000 1.100000 0.05092593
## 6 cg15775217 1.013333 1.100000 1.126667 0.05092593
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4|| METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9156
## [1] "The auc vlue is:"
## Area under the curve: 0.9156
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_svm_AUC <- mean_auc
}
# GOTO "INPUT" Session to set the Number of common features needed
NUM_COMMON_FEATURES <- NUM_COMMON_FEATURES_SET
The feature importance may not combined directly, since they are not all within the same measure, for example, the SVM model is use other method for feature importance.
So, let’s considering scale the feature to make them in the same range.
First, Let’s process with each data frame to ensure they have consistent format.
if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
# Process the dataframe to ensure they have consistent format.
# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"
head(importance_SVM_df_processed)
# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
importance_model_LRM1_df_processed$Feature<-rownames(importance_model_LRM1_df_processed)
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "Overall"] <- "Importance_LRM1"
head(importance_model_LRM1_df_processed)
# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed$Feature<-rownames(importance_elastic_net_model1_df_processed)
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "Overall"] <- "Importance_ENM1"
head(importance_elastic_net_model1_df_processed)
# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"
head(importance_xgb_model_df_processed)
# RF
importance_rf_model_df_processed <- importance_rf_model_df
if (METHOD_FEATURE_FLAG_NUM == 3){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(CI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 4){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 5){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 6){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, Dementia))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
head(importance_rf_model_df_processed)
}
From above (binary case), we could ensure they have same data frame structure with same column names, ‘Importance’ and ‘feature’ in order.
If our case is the multiclass classification, see the below. Except XGBoost model and SVM model, the features importance of each model are computed by the max importance among the classes.
if(METHOD_FEATURE_FLAG == 1){
# Process the dataframe to ensure they have consistent format.
# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"
head(importance_SVM_df_processed)
# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "MaxImportance"] <- "Importance_LRM1"
importance_model_LRM1_df_processed <- subset(importance_model_LRM1_df_processed, select = -c(Dementia,MCI, CN))
head(importance_model_LRM1_df_processed)
# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed <- subset(importance_elastic_net_model1_df_processed, select = -c(Dementia,MCI, CN))
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "MaxImportance"] <- "Importance_ENM1"
head(importance_elastic_net_model1_df_processed)
# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"
head(importance_xgb_model_df_processed)
# RF
importance_rf_model_df_processed <- importance_rf_model_df
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia,MCI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "MaxImportance"] <- "Importance_RF"
head(importance_rf_model_df_processed)
}
Then, Let’s do scaling, here we choose min-max scaling.
importance_list <- list(logistic = importance_model_LRM1_df_processed,
xgb = importance_xgb_model_df_processed,
elastic_net = importance_elastic_net_model1_df_processed,
rf = importance_rf_model_df_processed,
svm = importance_SVM_df_processed)
min_max_scale_Imp<-function(df){
x<-df[, grepl("Importance_", colnames(df))]
df[, grepl("Importance_", colnames(df))] <- (x - min(x)) / (max(x) - min(x))
return(df)
}
for (i in seq_along(importance_list)) {
importance_list[[i]] <- min_max_scale_Imp(importance_list[[i]])
}
# Print each data frame after scaling
print(head(importance_list[[1]]))
## Importance_LRM1 Feature
## age.now 0.00416345 age.now
## PC1 0.66841052 PC1
## PC2 0.43081520 PC2
## PC3 1.00000000 PC3
## cg18993517 0.09667700 cg18993517
## cg13573375 0.09253355 cg13573375
print(head(importance_list[[2]]))
## Importance_XGB Feature
## cg23432430 1.0000000 cg23432430
## age.now 0.6349744 age.now
## cg11438323 0.6199265 cg11438323
## cg11540596 0.5756862 cg11540596
## cg03660162 0.5201740 cg03660162
## cg17002719 0.4538267 cg17002719
print(head(importance_list[[3]]))
## Importance_ENM1 Feature
## age.now 0.003392297 age.now
## PC1 0.759892549 PC1
## PC2 0.845266706 PC2
## PC3 1.000000000 PC3
## cg18993517 0.186716182 cg18993517
## cg13573375 0.150726625 cg13573375
print(head(importance_list[[4]]))
## Importance_RF Feature
## age.now 0.5289698 age.now
## PC1 0.3091080 PC1
## PC2 0.5468701 PC2
## PC3 0.5539209 PC3
## cg18993517 0.2797494 cg18993517
## cg13573375 0.2691145 cg13573375
print(head(importance_list[[5]]))
## Importance_SVM Feature
## 1 1.0000000 cg23432430
## 2 0.8333333 cg12333628
## 3 0.8333333 cg00962106
## 4 0.6666667 cg03600007
## 5 0.6666667 cg19799454
## 6 0.6666667 cg15775217
Now, Let’s merge the data frames of scaled feature importance.
# Merge all importances
combined_importance <- Reduce(function(x, y) merge(x, y, by = "Feature", all = TRUE), importance_list)
head(combined_importance)
# Replace NA with 0
combined_importance[is.na(combined_importance)] <- 0
# Exclude DX, as it's label
combined_importance <- combined_importance %>%
filter(Feature != "DX")
# View the filtered dataframe
head(combined_importance)
If select the TOP Number of important features based on average importance. (See the following)
combined_importance_AVF <- combined_importance
# Calculate average importance
combined_importance_AVF$Average_Importance <- rowMeans(combined_importance_AVF[,-1])
head(combined_importance_AVF)
combined_importance_Avg_ordered <- combined_importance_AVF[order(-combined_importance_AVF$Average_Importance),]
head(combined_importance_Avg_ordered)
# Top Number of common important features
print("the Top number of common features here is set to:")
## [1] "the Top number of common features here is set to:"
print(NUM_COMMON_FEATURES)
## [1] 20
top_Num_combined_importance_Avg_ordered <- head(combined_importance_Avg_ordered,n = NUM_COMMON_FEATURES)
print(top_Num_combined_importance_Avg_ordered)
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 275 cg23432430 0.4919901 1.0000000 0.6227286 1.0000000 1.0000000 0.8229437
## 313 PC3 1.0000000 0.0000000 1.0000000 0.5539209 0.1666667 0.5441175
## 312 PC2 0.4308152 0.3804470 0.8452667 0.5468701 0.3333333 0.5073465
## 19 cg00962106 0.4116418 0.3243396 0.5202316 0.2711677 0.8333333 0.4721428
## 311 PC1 0.6684105 0.0322316 0.7598925 0.3091080 0.5000000 0.4539285
## 104 cg07158503 0.4041543 0.4231865 0.5113692 0.3100314 0.5000000 0.4297483
## 95 cg06697310 0.4024778 0.2062702 0.4996506 0.5434923 0.3333333 0.3970448
## 148 cg11331837 0.2739320 0.2538522 0.3343517 0.7012849 0.3333333 0.3793508
## 107 cg07634717 0.2111241 0.4114846 0.3415290 0.4133254 0.5000000 0.3754926
## 55 cg03660162 0.1781057 0.5201740 0.3579817 0.3014357 0.5000000 0.3715394
## 285 cg24851651 0.2441837 0.3664597 0.3139634 0.5957764 0.3333333 0.3707433
## 140 cg11019791 0.1828977 0.3709727 0.2905242 0.7446664 0.1666667 0.3511455
## 249 cg20685672 0.2441862 0.2450869 0.3304414 0.2541069 0.6666667 0.3480976
## 298 cg26081710 0.3235607 0.2004048 0.4029228 0.3109325 0.5000000 0.3475642
## 176 cg14168080 0.3058494 0.4150016 0.3628690 0.4784251 0.1666667 0.3457624
## 58 cg03749159 0.1366550 0.3271771 0.1890326 0.7288038 0.3333333 0.3430004
## 248 cg20678988 0.2435940 0.3749217 0.3480510 0.2445465 0.5000000 0.3422226
## 63 cg04156077 0.2679896 0.2788390 0.2897414 0.3698890 0.5000000 0.3412918
## 241 cg19503462 0.2158285 0.2777136 0.3310961 0.3808573 0.5000000 0.3410991
## 301 cg26853071 0.2101158 0.1135166 0.2653192 0.4412341 0.6666667 0.3393705
# Top Number of common important features' name
top_Num_combined_importance_Avg_ordered_Nam <- top_Num_combined_importance_Avg_ordered$Feature
print(top_Num_combined_importance_Avg_ordered_Nam)
## [1] "cg23432430" "PC3" "PC2" "cg00962106" "PC1" "cg07158503" "cg06697310" "cg11331837" "cg07634717" "cg03660162" "cg24851651" "cg11019791" "cg20685672" "cg26081710" "cg14168080"
## [16] "cg03749159" "cg20678988" "cg04156077" "cg19503462" "cg26853071"
Visualization with bar plot for the feature average importance
ggplot(combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip coordinates to make it horizontal
labs(title = "Feature Importance Sorted by Average Value",
x = "Feature",
y = "Average Importance") +
theme_minimal()
Visualization with bar plot for the top feature average importance
ggplot(top_Num_combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Average Value"),
x = "Feature",
y = "Average Importance") +
theme_minimal()
The following will show, If we select the TOP Number of important features based on specific quantile importance. ( Here we choose to use median i.e 50% quantile)
Let’s create the new data frame with different quantiles of feature importance for each models.
And order by the 50% quantile from high to low, select top features based on that.
quantiles <- t(apply(combined_importance[,-1], 1, function(x) quantile(x, probs = c(0,0.25, 0.5, 0.75,1))))
combined_importance_quantiles <- cbind(Feature = combined_importance$Feature, quantiles)
combined_importance_quantiles <- as.data.frame(combined_importance_quantiles)
combined_importance_quantiles$`50%` <- as.numeric(combined_importance_quantiles$`50%`)
combined_importance_quantiles$`0%` <- as.numeric(combined_importance_quantiles$`0%`)
combined_importance_quantiles$`25%` <- as.numeric(combined_importance_quantiles$`25%`)
combined_importance_quantiles$`75%` <- as.numeric(combined_importance_quantiles$`75%`)
combined_importance_quantiles$`100%` <- as.numeric(combined_importance_quantiles$`100%`)
# Sort by median importance (50th percentile)
combined_importance_quantiles <- combined_importance_quantiles[order(-combined_importance_quantiles$`50%`), ]
head(combined_importance_quantiles)
top_Num_median_features_imp <- head(combined_importance_quantiles,n = NUM_COMMON_FEATURES)
print(top_Num_median_features_imp)
## Feature 0% 25% 50% 75% 100%
## 275 cg23432430 0.491990101 0.62272860 1.0000000 1.0000000 1.0000000
## 313 PC3 0.000000000 0.16666667 0.5539209 1.0000000 1.0000000
## 1 age.now 0.003392297 0.00416345 0.5000000 0.5289698 0.6349744
## 311 PC1 0.032231598 0.30910796 0.5000000 0.6684105 0.7598925
## 312 PC2 0.333333333 0.38044700 0.4308152 0.5468701 0.8452667
## 104 cg07158503 0.310031430 0.40415432 0.4231865 0.5000000 0.5113692
## 19 cg00962106 0.271167721 0.32433959 0.4116418 0.5202316 0.8333333
## 107 cg07634717 0.211124130 0.34152897 0.4114846 0.4133254 0.5000000
## 95 cg06697310 0.206270193 0.33333333 0.4024778 0.4996506 0.5434923
## 176 cg14168080 0.166666667 0.30584936 0.3628690 0.4150016 0.4784251
## 55 cg03660162 0.178105731 0.30143569 0.3579817 0.5000000 0.5201740
## 33 cg02225060 0.014823116 0.16666667 0.3549348 0.4784559 0.5342408
## 106 cg07504457 0.166666667 0.27280725 0.3484476 0.3579963 0.4223343
## 134 cg10701746 0.036496597 0.33973960 0.3483977 0.4248608 0.5000000
## 248 cg20678988 0.243593957 0.24454648 0.3480510 0.3749217 0.5000000
## 121 cg09015880 0.015224654 0.33333333 0.3367794 0.3548429 0.3801625
## 242 cg19799454 0.004133443 0.10685290 0.3344244 0.4012688 0.6666667
## 2 cg00004073 0.252321139 0.26934952 0.3333333 0.3687430 0.3932005
## 6 cg00154902 0.205862688 0.22374844 0.3333333 0.3571489 0.4724526
## 46 cg02887598 0.068421117 0.29894356 0.3333333 0.3737771 0.4329452
top_Num_median_features_Name<-top_Num_median_features_imp$Feature
print(top_Num_median_features_Name)
## [1] "cg23432430" "PC3" "age.now" "PC1" "PC2" "cg07158503" "cg00962106" "cg07634717" "cg06697310" "cg14168080" "cg03660162" "cg02225060" "cg07504457" "cg10701746" "cg20678988"
## [16] "cg09015880" "cg19799454" "cg00004073" "cg00154902" "cg02887598"
Visualization with the box plot.
library(tidyr)
long_df <- pivot_longer(combined_importance_quantiles,
cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
names_to = "Quantile",
values_to = "Importance")
ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
geom_boxplot() +
coord_flip() +
labs(title = "Distribution of Feature Importances",
x = "Feature",
y = "Importance") +
theme_minimal()
Visualization with top features with box plot.
library(tidyr)
long_df <- pivot_longer(top_Num_median_features_imp,
cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
names_to = "Quantile",
values_to = "Importance")
ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
geom_boxplot() +
coord_flip() +
labs(
title = paste("Distribution of Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Median Value"),
x = "Feature",
y = "Importance") +
theme_minimal()
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- NUM_COMMON_FEATURES_SET_Frequency
combined_importance_freq_ordered_df<-combined_importance_Avg_ordered
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature
# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature
# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature
# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature
# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))
models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models),
dimnames = list(all_features, models))
# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
feature_matrix[feature, "LRM"] <-
as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
feature_matrix[feature, "XGB"] <-
as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
feature_matrix[feature, "ENM"] <-
as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
feature_matrix[feature, "RF"] <-
as.integer(feature %in% top_impAvg_orderby_RF_NAME)
feature_matrix[feature, "SVM"] <-
as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}
feature_df <- as.data.frame(feature_matrix)
print(head(feature_df))
## LRM XGB ENM RF SVM
## PC3 1 0 1 1 0
## PC1 1 0 1 0 1
## cg23432430 1 1 1 1 1
## cg09727210 1 0 1 0 0
## PC2 1 1 1 1 0
## cg00962106 1 0 1 0 1
For quickly read, we calculate the time that the feature have been appeared, by calculated row sum and add the row sum column into our data frame.
feature_df$Total_Count <- rowSums(feature_df[,1:5])
feature_df <- feature_df[order(-feature_df$Total_Count), ]
frequency_feature_df_RAW_ordered<-feature_df
print(feature_df)
## LRM XGB ENM RF SVM Total_Count
## cg23432430 1 1 1 1 1 5
## PC2 1 1 1 1 0 4
## cg07158503 1 1 1 0 1 4
## PC3 1 0 1 1 0 3
## PC1 1 0 1 0 1 3
## cg00962106 1 0 1 0 1 3
## cg06697310 1 0 1 1 0 3
## cg26081710 1 0 1 0 1 3
## cg09727210 1 0 1 0 0 2
## cg02225060 1 0 1 0 0 2
## cg09015880 1 0 1 0 0 2
## cg16338321 1 0 1 0 0 2
## cg00819121 1 0 1 0 0 2
## cg00415024 1 0 0 1 0 2
## cg21757617 1 0 1 0 0 2
## cg14168080 1 1 0 0 0 2
## cg02887598 1 0 1 0 0 2
## cg05064044 1 0 1 0 0 2
## cg03660162 0 1 0 0 1 2
## cg07634717 0 1 0 0 1 2
## cg19799454 0 1 0 0 1 2
## cg20678988 0 1 0 0 1 2
## cg11019791 0 1 0 1 0 2
## cg10701746 1 0 0 0 0 1
## cg01910713 1 0 0 0 0 1
## age.now 0 1 0 0 0 1
## cg11438323 0 1 0 0 0 1
## cg11540596 0 1 0 0 0 1
## cg17002719 0 1 0 0 0 1
## cg09120722 0 1 0 0 0 1
## cg17002338 0 1 0 0 0 1
## cg11227702 0 1 0 0 0 1
## cg18816397 0 1 0 0 0 1
## cg02122327 0 1 0 0 0 1
## cg13573375 0 1 0 0 0 1
## cg03088219 0 1 0 0 0 1
## cg06277607 0 0 1 0 0 1
## cg27272246 0 0 1 0 0 1
## cg00004073 0 0 1 0 0 1
## cg17429539 0 0 1 0 0 1
## cg03749159 0 0 0 1 0 1
## cg11331837 0 0 0 1 0 1
## cg21697769 0 0 0 1 0 1
## cg01008088 0 0 0 1 0 1
## cg04768387 0 0 0 1 0 1
## cg16431720 0 0 0 1 0 1
## cg12784167 0 0 0 1 0 1
## cg23159970 0 0 0 1 0 1
## cg24851651 0 0 0 1 0 1
## cg17042243 0 0 0 1 0 1
## cg09451339 0 0 0 1 0 1
## cg17386240 0 0 0 1 0 1
## cg04109990 0 0 0 1 0 1
## cg14192979 0 0 0 1 0 1
## cg12333628 0 0 0 0 1 1
## cg20685672 0 0 0 0 1 1
## cg26853071 0 0 0 0 1 1
## cg24883219 0 0 0 0 1 1
## cg06833284 0 0 0 0 1 1
## cg03600007 0 0 0 0 1 1
## cg01280698 0 0 0 0 1 1
## cg13226272 0 0 0 0 1 1
## cg15775217 0 0 0 0 1 1
## cg04156077 0 0 0 0 1 1
## cg19503462 0 0 0 0 1 1
all_features <- union(combined_importance_freq_ordered_df$Feature, rownames(feature_df))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_df_full <- data.frame(Feature = all_features)
feature_df_full <- merge(feature_df_full, feature_df, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_df_full[is.na(feature_df_full)] <- 0
# For top_impAvg_ordered
all_impAvg_ordered_full <- data.frame(Feature = all_features)
all_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,all_impAvg_ordered_full, by.x = "Feature", by.y = "Feature", all.x = TRUE)
all_impAvg_ordered_full[is.na(all_impAvg_ordered_full)] <- 0
all_combined_df_impAvg <- merge(feature_df_full, all_impAvg_ordered_full, by = "Feature", all = TRUE)
print(head(feature_df_full))
## Feature LRM XGB ENM RF SVM Total_Count
## 1 age.now 0 1 0 0 0 1
## 2 cg00004073 0 0 1 0 0 1
## 3 cg00084271 0 0 0 0 0 0
## 4 cg00086247 0 0 0 0 0 0
## 5 cg00146240 0 0 0 0 0 0
## 6 cg00154902 0 0 0 0 0 0
print(head(all_impAvg_ordered_full))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0.00416345 0.63497444 0.003392297 0.5289698 0.5000000 0.3343000
## 2 cg00004073 0.26934952 0.25232114 0.368743031 0.3932005 0.3333333 0.3233895
## 3 cg00084271 0.22358222 0.08451443 0.272790932 0.5066986 0.1666667 0.2508506
## 4 cg00086247 0.00000000 0.15625070 0.068094153 0.2757605 0.0000000 0.1000211
## 5 cg00146240 0.08729337 0.00000000 0.195466203 0.5233594 0.1666667 0.1945571
## 6 cg00154902 0.20586269 0.35714894 0.223748437 0.4724526 0.3333333 0.3185092
print(head(all_combined_df_impAvg))
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0 1 0 0 0 1 0.00416345 0.63497444 0.003392297 0.5289698 0.5000000 0.3343000
## 2 cg00004073 0 0 1 0 0 1 0.26934952 0.25232114 0.368743031 0.3932005 0.3333333 0.3233895
## 3 cg00084271 0 0 0 0 0 0 0.22358222 0.08451443 0.272790932 0.5066986 0.1666667 0.2508506
## 4 cg00086247 0 0 0 0 0 0 0.00000000 0.15625070 0.068094153 0.2757605 0.0000000 0.1000211
## 5 cg00146240 0 0 0 0 0 0 0.08729337 0.00000000 0.195466203 0.5233594 0.1666667 0.1945571
## 6 cg00154902 0 0 0 0 0 0 0.20586269 0.35714894 0.223748437 0.4724526 0.3333333 0.3185092
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.
if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data[,c("DX",df_process_mutual_FeatureName)]
print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
## [1] "The number of final used features of common importance method: 8"
if(METHOD_FEATURE_FLAG == 1){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data_m1[,c("DX",df_process_mutual_FeatureName)]
print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
print(df_process_mutual_FeatureName)
## [1] "cg23432430" "PC2" "cg07158503" "PC3" "PC1" "cg00962106" "cg06697310" "cg26081710"
Importance of these features:
Top_Frequency_Feature_importance <- combined_importance_freq_ordered_df[
combined_importance_freq_ordered_df$Feature %in% df_process_mutual_FeatureName,
]
print(Top_Frequency_Feature_importance)
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 275 cg23432430 0.4919901 1.0000000 0.6227286 1.0000000 1.0000000 0.8229437
## 313 PC3 1.0000000 0.0000000 1.0000000 0.5539209 0.1666667 0.5441175
## 312 PC2 0.4308152 0.3804470 0.8452667 0.5468701 0.3333333 0.5073465
## 19 cg00962106 0.4116418 0.3243396 0.5202316 0.2711677 0.8333333 0.4721428
## 311 PC1 0.6684105 0.0322316 0.7598925 0.3091080 0.5000000 0.4539285
## 104 cg07158503 0.4041543 0.4231865 0.5113692 0.3100314 0.5000000 0.4297483
## 95 cg06697310 0.4024778 0.2062702 0.4996506 0.5434923 0.3333333 0.3970448
## 298 cg26081710 0.3235607 0.2004048 0.4029228 0.3109325 0.5000000 0.3475642
ggplot(Top_Frequency_Feature_importance, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Feature Importance Selected Based on Frequncy Way and Sorted by Average Value",
x = "Feature",
y = "Average Importance") +
theme_minimal()
# This is to check if all elements inside Mutual method is in Mean method, and print out the features that not in Mean method
all(df_process_mutual_FeatureName %in% top_Num_combined_importance_Avg_ordered_Nam)
## [1] TRUE
Mutual_not_in_Mean <- setdiff(df_process_mutual_FeatureName, top_Num_combined_importance_Avg_ordered_Nam)
print(Mutual_not_in_Mean)
## character(0)
Phenotype Part Data frame : “phenoticPart_RAW”
RAW Merged Data frame : “merged_df_raw”
Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”
Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”
Ordered Feature Frequency / Common Data Frame:
“frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency.
“feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
head(phenoticPart_RAW)
#
# save(NUM_COMMON_FEATURES,
# combined_importance_quantiles,
# combined_importance_Avg_ordered,
# frequency_feature_df_RAW_ordered,
# top_Num_median_features_Name,
# top_Num_combined_importance_Avg_ordered_Nam,
# file = "Part2_V8_08_top_features_5KCpGs.RData")
#
# save(processed_data_m3,processed_data_m3_df,AfterProcess_FeatureName_m3,file = "Part2_V8_08_BinaryMerged_5KCpGs.RData")
#
# save(phenoticPart_RAW, merged_df_raw, file = "PhenotypeAndMerged.RData")
The feature selection method :
Number_fea_input <- INPUT_NUMBER_FEATURES
Flag_8mean <- INPUT_Method_Mean_Choose
Flag_8median <- INPUT_Method_Median_Choose
Flag_8Fequency <- INPUT_Method_Frequency_Choose
print(paste("the Top number of features here is set to:", Number_fea_input))
## [1] "the Top number of features here is set to: 250"
Flag_8mean
## [1] TRUE
Flag_8median
## [1] TRUE
Flag_8Fequency
## [1] TRUE
selected_impAvg_ordered <- head(combined_importance_Avg_ordered,n = Number_fea_input)
print(head(selected_impAvg_ordered))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 275 cg23432430 0.4919901 1.0000000 0.6227286 1.0000000 1.0000000 0.8229437
## 313 PC3 1.0000000 0.0000000 1.0000000 0.5539209 0.1666667 0.5441175
## 312 PC2 0.4308152 0.3804470 0.8452667 0.5468701 0.3333333 0.5073465
## 19 cg00962106 0.4116418 0.3243396 0.5202316 0.2711677 0.8333333 0.4721428
## 311 PC1 0.6684105 0.0322316 0.7598925 0.3091080 0.5000000 0.4539285
## 104 cg07158503 0.4041543 0.4231865 0.5113692 0.3100314 0.5000000 0.4297483
print(dim(selected_impAvg_ordered))
## [1] 250 7
selected_impAvg_ordered_NAME <- selected_impAvg_ordered$Feature
print(head(selected_impAvg_ordered_NAME))
## [1] "cg23432430" "PC3" "PC2" "cg00962106" "PC1" "cg07158503"
df_selected_Mean <- processed_dataFrame[,c("DX",selected_impAvg_ordered_NAME)]
print(head(df_selected_Mean))
## DX cg23432430 PC3 PC2 cg00962106 PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080
## 200223270003_R02C01 CI 0.9482702 -0.014043316 0.01470293 0.9124898 -0.214185447 0.5777146 0.8454609 0.03692842 0.7483382 0.8691767 0.03674702 0.8112324 0.6712101 0.8751040 0.4190123
## 200223270003_R03C01 CN 0.9455418 0.005055871 0.05745834 0.5375751 -0.172761185 0.6203543 0.8653044 0.57150125 0.8254434 0.5160770 0.05358297 0.7831231 0.7932091 0.9198212 0.4420256
## 200223270003_R06C01 CN 0.9418716 0.029143653 0.08372861 0.5040948 -0.003667305 0.6236025 0.2405168 0.03182862 0.8181246 0.9026304 0.05968923 0.4353250 0.6613646 0.8801892 0.4355521
## cg03749159 cg20678988 cg04156077 cg19503462 cg26853071 age.now cg11540596 cg00415024 cg10701746 cg00004073 cg11227702 cg19471911 cg09727210 cg00154902 cg17002719 cg07504457
## 200223270003_R02C01 0.9355921 0.8438718 0.7321883 0.7951675 0.4233820 82.4 0.9238951 0.4299553 0.4795503 0.02928535 0.86486075 0.6334393 0.4240111 0.5137741 0.04939181 0.7116230
## 200223270003_R03C01 0.9153921 0.8548886 0.6865805 0.4537684 0.7451354 78.6 0.8926595 0.3999122 0.4868342 0.02787198 0.49184121 0.8437175 0.8812928 0.8540746 0.40466475 0.6854539
## 200223270003_R06C01 0.9255807 0.7786685 0.8501188 0.6997359 0.4228079 80.4 0.8820252 0.7465084 0.4927257 0.64576463 0.02543724 0.6127952 0.8493743 0.8188126 0.51428089 0.7205633
## cg25879395 cg01008088 cg02225060 cg12543766 cg09120722 cg11787167 cg19799454 cg02887598 cg01128042 cg21697769 cg25208881 cg16779438 cg17386240 cg03088219 cg24883219 cg15535896
## 200223270003_R02C01 0.88130864 0.8424817 0.6828159 0.51028134 0.5878977 0.03853894 0.9178930 0.04020908 0.9113420 0.8946108 0.1851956 0.8826150 0.7473400 0.844002862 0.6430473 0.3382952
## 200223270003_R03C01 0.02603438 0.2417656 0.8265195 0.88741539 0.8287506 0.04673831 0.9106247 0.67073881 0.5328806 0.2822953 0.9092286 0.5466924 0.7144809 0.007435243 0.6822115 0.9253926
## 200223270003_R06C01 0.91060615 0.2618620 0.5209552 0.02818501 0.8793344 0.32564508 0.9066551 0.73408417 0.5222757 0.8698740 0.9265502 0.8629492 0.8074824 0.120155222 0.5296903 0.3320191
## cg16338321 cg21757617 cg18285382 cg17429539 cg10738648 cg02078724 cg09015880 cg20823859 cg18816397 cg16431720 cg06833284 cg23517115 cg11438323 cg02932958 cg08096656 cg05064044
## 200223270003_R02C01 0.5350242 0.03652647 0.3202927 0.7860900 0.44931577 0.3096774 0.5101716 0.9030711 0.5472925 0.7356099 0.9125144 0.2151144 0.4863471 0.7901008 0.9362594 0.5672851
## 200223270003_R03C01 0.8294062 0.44299089 0.2930577 0.7100923 0.49894016 0.2896133 0.8402106 0.6062985 0.4940355 0.8692449 0.9003482 0.9131440 0.8984559 0.4210489 0.9314878 0.5358875
## 200223270003_R06C01 0.4918708 0.44725379 0.8923595 0.7660838 0.05552024 0.2805612 0.8472063 0.8917348 0.5337018 0.8773137 0.6097933 0.8328364 0.8722772 0.3825995 0.4943033 0.5273964
## cg05234269 cg25169289 cg14710850 cg26679884 cg03600007 cg15098922 cg01921484 cg16715186 cg06961873 cg12240569 cg01910713 cg25712921 cg00648024 cg03982462 cg08745107 cg26983017
## 200223270003_R02C01 0.93848584 0.1100884 0.8048592 0.6793815 0.5658487 0.9286092 0.9098550 0.2742789 0.5335591 0.82772064 0.8573169 0.2829848 0.51410972 0.8562777 0.02921338 0.89868232
## 200223270003_R03C01 0.57461229 0.7667174 0.8090950 0.1848705 0.6018832 0.9027517 0.9093137 0.7946153 0.5472606 0.02690547 0.8538850 0.6220919 0.40202875 0.6023731 0.78542320 0.03145466
## 200223270003_R06C01 0.02467208 0.2264993 0.8285902 0.1701734 0.8611166 0.8525611 0.9204487 0.8124316 0.9415177 0.46030640 0.8110366 0.6384003 0.05579011 0.8778458 0.02709928 0.84677625
## cg00084271 cg16858433 cg06371647 cg26846609 cg15184869 cg13573375 cg04831745 cg22931151 cg18918831 cg07640670 cg15600437 cg01280698 cg12689021 cg27577781 cg13405878 cg22666875
## 200223270003_R02C01 0.8103611 0.9184356 0.8336894 0.48860949 0.8622328 0.8670419 0.61984995 0.9311023 0.4891660 0.58296513 0.4885353 0.8985067 0.7706828 0.8143535 0.4549662 0.8177182
## 200223270003_R03C01 0.7877006 0.9194211 0.8198684 0.04878986 0.8996252 0.1733934 0.71214149 0.9356702 0.5333801 0.55225610 0.4894487 0.8846201 0.7449475 0.8113185 0.7858042 0.8291957
## 200223270003_R06C01 0.7706165 0.9271632 0.8069537 0.48026945 0.8688117 0.8888246 0.06871768 0.9328614 0.6406575 0.04058533 0.8551374 0.8847132 0.7872237 0.8144274 0.7583938 0.3694180
## cg16536985 cg16202259 cg18857647 cg22305850 cg27224751 cg09247979 cg12333628 cg16571124 cg03979311 cg12421087 cg15700429 cg13739190 cg00819121 cg25436480 cg04768387 cg24634455
## 200223270003_R02C01 0.5789643 0.9548726 0.8582332 0.03361934 0.44503947 0.5070956 0.9227884 0.9282854 0.86644909 0.5647607 0.7879010 0.8510103 0.9207001 0.8425160 0.3131047 0.7796391
## 200223270003_R03C01 0.5418687 0.3713483 0.8394132 0.57522232 0.03214912 0.5706177 0.9092861 0.9206431 0.06199853 0.5399655 0.9114530 0.8358482 0.9281472 0.4994032 0.9465814 0.5188241
## 200223270003_R06C01 0.8392044 0.4852461 0.2647491 0.58548744 0.83123722 0.5090215 0.5084647 0.9276842 0.72615553 0.5400348 0.8838233 0.8419471 0.9327211 0.3494312 0.9098563 0.5325725
## cg11133939 cg17042243 cg22542451 cg01608425 cg06864789 cg06880438 cg13387643 cg12702014 cg03737947 cg02823329 cg00696044 cg06960717 cg20673830 cg25649515 cg10681981 cg15633912
## 200223270003_R02C01 0.1282694 0.2502905 0.5884356 0.9030410 0.05369415 0.8285145 0.4229959 0.7704049 0.91824910 0.9462397 0.55608424 0.7030978 0.2422052 0.9279829 0.7035090 0.1605530
## 200223270003_R03C01 0.5920898 0.2933475 0.8337068 0.9264388 0.46053125 0.7988881 0.4200273 0.7848681 0.92067153 0.6464005 0.07552381 0.7653402 0.6881735 0.9235753 0.7382662 0.9333421
## 200223270003_R06C01 0.5127706 0.2725457 0.8125084 0.8887753 0.87513655 0.7839538 0.4161488 0.8065993 0.03638091 0.9633930 0.79270858 0.7206218 0.2134634 0.5895839 0.6971989 0.8737362
## cg02668233 cg27272246 cg18150287 cg18339359 cg04718469 cg01933473 cg02122327 cg18993517 cg02495179 cg02356645 cg09216282 cg09584650 cg00512739 cg23352245 cg12776173 cg19301366
## 200223270003_R02C01 0.4708431 0.8615873 0.7685695 0.8824858 0.8687522 0.2589014 0.38940091 0.2091538 0.6813307 0.5105903 0.9349248 0.08230254 0.9337648 0.9377232 0.1038804 0.8831393
## 200223270003_R03C01 0.8841930 0.8705287 0.7519166 0.9040272 0.7256813 0.6726133 0.37769608 0.2665896 0.7373055 0.5833923 0.9244259 0.09661586 0.8863895 0.9375774 0.8730635 0.8072679
## 200223270003_R06C01 0.4575646 0.8103777 0.2501173 0.8552121 0.8521881 0.2642560 0.04017909 0.2574003 0.5588114 0.5701428 0.9263996 0.52399749 0.9242748 0.5932742 0.7009491 0.8796022
## cg25758034 cg04316537 cg14687298 cg13226272 cg13372276 cg12556569 cg06277607 cg17002338 cg24307368 cg14627380 cg10091792 cg08584917 cg18819889 cg24697433 cg03084184 cg23159970
## 200223270003_R02C01 0.6114028 0.8074830 0.04206702 0.02637249 0.04888111 0.06218231 0.10744587 0.9286251 0.64323677 0.9455369 0.8670733 0.5663205 0.9156157 0.9243095 0.8162981 0.61817246
## 200223270003_R03C01 0.6649219 0.8453340 0.14813581 0.54100016 0.62396373 0.03924599 0.09353494 0.2684163 0.34980461 0.9258964 0.5864221 0.9019732 0.9004455 0.6808390 0.7877128 0.57492600
## 200223270003_R06C01 0.2393844 0.4351695 0.24260002 0.44370701 0.59693465 0.48636893 0.09504696 0.2811103 0.02720398 0.5789898 0.6087997 0.9187789 0.9054439 0.6384606 0.4546397 0.03288909
## cg22112152 cg12784167 cg08198851 cg17129965 cg00939409 cg08788093 cg09451339 cg20078646 cg10788927 cg16089727 cg00146240 cg15775217 cg18526121 cg01662749 cg14192979 cg03672288
## 200223270003_R02C01 0.8476101 0.81503498 0.6578905 0.8972140 0.2652180 0.03911678 0.2243746 0.06198170 0.8973154 0.86748697 0.6336151 0.5707441 0.4519781 0.3506201 0.06336040 0.9235592
## 200223270003_R03C01 0.8014136 0.02811410 0.6578186 0.8806673 0.8882671 0.60934160 0.2340702 0.89537412 0.2021398 0.54996692 0.8957183 0.9168327 0.4762313 0.2510946 0.06019651 0.6718625
## 200223270003_R06C01 0.7897897 0.03073269 0.1272153 0.8857237 0.8842646 0.88380243 0.8921284 0.08725521 0.2053075 0.05876736 0.1433218 0.6042521 0.4833367 0.8061480 0.52114282 0.9007629
## cg25306893 cg05392160 cg05321907 cg25277809 cg05876883 cg06715136 cg06483046 cg14307563 cg14170504 cg04497611 cg24139837 cg05161773 cg05593887 cg11286989 cg10240127 cg27160885
## 200223270003_R02C01 0.6265392 0.9328933 0.2880477 0.1632342 0.9039064 0.3400192 0.04383925 0.1855966 0.54915621 0.9086359 0.07404605 0.4120912 0.5939220 0.7590008 0.9250553 0.2231606
## 200223270003_R03C01 0.8330282 0.2576881 0.1782629 0.4913711 0.9223308 0.9259109 0.50720277 0.8916957 0.02236650 0.8818513 0.04183445 0.4154907 0.5766550 0.8533989 0.9403255 0.8263885
## 200223270003_R06C01 0.6175380 0.8920726 0.8427929 0.5952124 0.4697980 0.9079807 0.89604910 0.8750052 0.02988245 0.5853116 0.05657120 0.8526849 0.9148338 0.7313884 0.9056974 0.2121179
## cg01549082 cg04412904 cg14532717 cg06118351 cg22535849 cg11706829 cg00322003 cg08554146 cg02627240 cg18029737 cg17723206 cg03549208 cg21986118 cg05850457 cg09785377 cg14293999
## 200223270003_R02C01 0.2924138 0.05088595 0.5732280 0.3633940 0.8847704 0.8897234 0.1759911 0.8982080 0.66706843 0.9100454 0.92881042 0.9014487 0.6658175 0.8183013 0.9162088 0.2836710
## 200223270003_R03C01 0.7065693 0.07717659 0.1107638 0.4714860 0.8609966 0.5444785 0.5702070 0.8963074 0.57129408 0.9016634 0.48556255 0.8381784 0.6571296 0.8313023 0.9226292 0.9172023
## 200223270003_R06C01 0.2895440 0.08253743 0.6273416 0.8655962 0.8808022 0.5669449 0.3077122 0.8213878 0.05309659 0.7376586 0.01765023 0.9097817 0.7034445 0.8161364 0.6405193 0.9168166
## cg07138269 cg15985500 cg14780448 cg04124201 cg17738613 cg17906851 cg22169467 cg22071943 cg20981163 cg10039445 cg02246922 cg08896901 cg02631626 cg11247378 cg08857872 cg00295418
## 200223270003_R02C01 0.5002290 0.8555262 0.9119141 0.8686421 0.6879612 0.9488392 0.3095010 0.8705217 0.8990628 0.8833873 0.7301201 0.3581911 0.6280766 0.1591185 0.3395280 0.44954665
## 200223270003_R03C01 0.9426707 0.8312198 0.6702102 0.3308589 0.6582258 0.9529718 0.2978585 0.2442648 0.9264076 0.8954055 0.9447019 0.2467071 0.1951736 0.7874849 0.8181845 0.48471295
## 200223270003_R06C01 0.5057781 0.8492103 0.6207355 0.3241613 0.1022257 0.6462151 0.8955853 0.2644581 0.4874651 0.8832807 0.7202230 0.9225209 0.2699849 0.4807942 0.2970779 0.02004532
## cg14507637 cg18949721 cg11187460 cg12146221 cg08041188 cg04867412 cg00345083 cg11268585 cg21388339 cg12228670 cg23916408 cg26901661 cg21243064 cg06403901 cg15730644 cg00322820
## 200223270003_R02C01 0.9051258 0.2334245 0.03672179 0.2049284 0.7752456 0.04304823 0.47960968 0.2521544 0.2756268 0.8632174 0.1942275 0.8951971 0.5191606 0.92790690 0.4803181 0.4869764
## 200223270003_R03C01 0.9009460 0.2437792 0.92516409 0.1814927 0.3201255 0.87967997 0.50833875 0.8535791 0.2102269 0.8496212 0.9154993 0.8754981 0.9167649 0.04783341 0.4353906 0.4858988
## 200223270003_R06C01 0.9013686 0.2523095 0.03109553 0.8619250 0.7900939 0.44971146 0.03929249 0.9121931 0.7649181 0.8738949 0.8886255 0.9021064 0.4862205 0.05253626 0.8763048 0.4754313
## cg04645024 cg24643105 cg03221390 cg21139150 cg17131279 cg15501526 cg13653328 cg24470466 cg23836570 cg13038195 cg04664583
## 200223270003_R02C01 0.7366541 0.5303418 0.5859063 0.01853264 0.1900637 0.6362531 0.9245434 0.7725300 0.58688450 0.45882213 0.5572814
## 200223270003_R03C01 0.8454827 0.5042688 0.9180706 0.43223243 0.7048637 0.6319253 0.5122938 0.9041432 0.54259383 0.02740132 0.5881190
## 200223270003_R06C01 0.0871902 0.9383050 0.6399867 0.43772680 0.1492861 0.7435100 0.9362798 0.1206738 0.03267304 0.46284376 0.9352717
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Mean)
## [1] 648 251
print(selected_impAvg_ordered_NAME)
## [1] "cg23432430" "PC3" "PC2" "cg00962106" "PC1" "cg07158503" "cg06697310" "cg11331837" "cg07634717" "cg03660162" "cg24851651" "cg11019791" "cg20685672" "cg26081710" "cg14168080"
## [16] "cg03749159" "cg20678988" "cg04156077" "cg19503462" "cg26853071" "age.now" "cg11540596" "cg00415024" "cg10701746" "cg00004073" "cg11227702" "cg19471911" "cg09727210" "cg00154902" "cg17002719"
## [31] "cg07504457" "cg25879395" "cg01008088" "cg02225060" "cg12543766" "cg09120722" "cg11787167" "cg19799454" "cg02887598" "cg01128042" "cg21697769" "cg25208881" "cg16779438" "cg17386240" "cg03088219"
## [46] "cg24883219" "cg15535896" "cg16338321" "cg21757617" "cg18285382" "cg17429539" "cg10738648" "cg02078724" "cg09015880" "cg20823859" "cg18816397" "cg16431720" "cg06833284" "cg23517115" "cg11438323"
## [61] "cg02932958" "cg08096656" "cg05064044" "cg05234269" "cg25169289" "cg14710850" "cg26679884" "cg03600007" "cg15098922" "cg01921484" "cg16715186" "cg06961873" "cg12240569" "cg01910713" "cg25712921"
## [76] "cg00648024" "cg03982462" "cg08745107" "cg26983017" "cg00084271" "cg16858433" "cg06371647" "cg26846609" "cg15184869" "cg13573375" "cg04831745" "cg22931151" "cg18918831" "cg07640670" "cg15600437"
## [91] "cg01280698" "cg12689021" "cg27577781" "cg13405878" "cg22666875" "cg16536985" "cg16202259" "cg18857647" "cg22305850" "cg27224751" "cg09247979" "cg12333628" "cg16571124" "cg03979311" "cg12421087"
## [106] "cg15700429" "cg13739190" "cg00819121" "cg25436480" "cg04768387" "cg24634455" "cg11133939" "cg17042243" "cg22542451" "cg01608425" "cg06864789" "cg06880438" "cg13387643" "cg12702014" "cg03737947"
## [121] "cg02823329" "cg00696044" "cg06960717" "cg20673830" "cg25649515" "cg10681981" "cg15633912" "cg02668233" "cg27272246" "cg18150287" "cg18339359" "cg04718469" "cg01933473" "cg02122327" "cg18993517"
## [136] "cg02495179" "cg02356645" "cg09216282" "cg09584650" "cg00512739" "cg23352245" "cg12776173" "cg19301366" "cg25758034" "cg04316537" "cg14687298" "cg13226272" "cg13372276" "cg12556569" "cg06277607"
## [151] "cg17002338" "cg24307368" "cg14627380" "cg10091792" "cg08584917" "cg18819889" "cg24697433" "cg03084184" "cg23159970" "cg22112152" "cg12784167" "cg08198851" "cg17129965" "cg00939409" "cg08788093"
## [166] "cg09451339" "cg20078646" "cg10788927" "cg16089727" "cg00146240" "cg15775217" "cg18526121" "cg01662749" "cg14192979" "cg03672288" "cg25306893" "cg05392160" "cg05321907" "cg25277809" "cg05876883"
## [181] "cg06715136" "cg06483046" "cg14307563" "cg14170504" "cg04497611" "cg24139837" "cg05161773" "cg05593887" "cg11286989" "cg10240127" "cg27160885" "cg01549082" "cg04412904" "cg14532717" "cg06118351"
## [196] "cg22535849" "cg11706829" "cg00322003" "cg08554146" "cg02627240" "cg18029737" "cg17723206" "cg03549208" "cg21986118" "cg05850457" "cg09785377" "cg14293999" "cg07138269" "cg15985500" "cg14780448"
## [211] "cg04124201" "cg17738613" "cg17906851" "cg22169467" "cg22071943" "cg20981163" "cg10039445" "cg02246922" "cg08896901" "cg02631626" "cg11247378" "cg08857872" "cg00295418" "cg14507637" "cg18949721"
## [226] "cg11187460" "cg12146221" "cg08041188" "cg04867412" "cg00345083" "cg11268585" "cg21388339" "cg12228670" "cg23916408" "cg26901661" "cg21243064" "cg06403901" "cg15730644" "cg00322820" "cg04645024"
## [241] "cg24643105" "cg03221390" "cg21139150" "cg17131279" "cg15501526" "cg13653328" "cg24470466" "cg23836570" "cg13038195" "cg04664583"
output_mean_process<-processed_data[,c("DX",selected_impAvg_ordered_NAME)]
print(head(output_mean_process))
## # A tibble: 6 × 251
## DX cg23432430 PC3 PC2 cg00962106 PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080 cg03749159 cg20678988 cg04156077
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CI 0.948 -0.0140 1.47e-2 0.912 -0.214 0.578 0.845 0.0369 0.748 0.869 0.0367 0.811 0.671 0.875 0.419 0.936 0.844 0.732
## 2 CN 0.946 0.00506 5.75e-2 0.538 -0.173 0.620 0.865 0.572 0.825 0.516 0.0536 0.783 0.793 0.920 0.442 0.915 0.855 0.687
## 3 CN 0.942 0.0291 8.37e-2 0.504 -0.00367 0.624 0.241 0.0318 0.818 0.903 0.0597 0.435 0.661 0.880 0.436 0.926 0.779 0.850
## 4 CI 0.943 -0.0323 -1.12e-2 0.904 -0.187 0.599 0.848 0.0383 0.758 0.531 0.609 0.850 0.808 0.915 0.957 0.629 0.826 0.680
## 5 CI 0.946 0.0529 1.65e-5 0.896 0.0268 0.631 0.821 0.930 0.826 0.926 0.0883 0.854 0.0829 0.917 0.946 0.929 0.330 0.891
## 6 CN 0.951 -0.00869 1.57e-2 0.886 -0.0379 0.615 0.784 0.540 0.210 0.894 0.919 0.738 0.845 0.923 0.399 0.612 0.854 0.837
## # ℹ 232 more variables: cg19503462 <dbl>, cg26853071 <dbl>, age.now <dbl>, cg11540596 <dbl>, cg00415024 <dbl>, cg10701746 <dbl>, cg00004073 <dbl>, cg11227702 <dbl>, cg19471911 <dbl>,
## # cg09727210 <dbl>, cg00154902 <dbl>, cg17002719 <dbl>, cg07504457 <dbl>, cg25879395 <dbl>, cg01008088 <dbl>, cg02225060 <dbl>, cg12543766 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>,
## # cg19799454 <dbl>, cg02887598 <dbl>, cg01128042 <dbl>, cg21697769 <dbl>, cg25208881 <dbl>, cg16779438 <dbl>, cg17386240 <dbl>, cg03088219 <dbl>, cg24883219 <dbl>, cg15535896 <dbl>,
## # cg16338321 <dbl>, cg21757617 <dbl>, cg18285382 <dbl>, cg17429539 <dbl>, cg10738648 <dbl>, cg02078724 <dbl>, cg09015880 <dbl>, cg20823859 <dbl>, cg18816397 <dbl>, cg16431720 <dbl>,
## # cg06833284 <dbl>, cg23517115 <dbl>, cg11438323 <dbl>, cg02932958 <dbl>, cg08096656 <dbl>, cg05064044 <dbl>, cg05234269 <dbl>, cg25169289 <dbl>, cg14710850 <dbl>, cg26679884 <dbl>,
## # cg03600007 <dbl>, cg15098922 <dbl>, cg01921484 <dbl>, cg16715186 <dbl>, cg06961873 <dbl>, cg12240569 <dbl>, cg01910713 <dbl>, cg25712921 <dbl>, cg00648024 <dbl>, cg03982462 <dbl>,
## # cg08745107 <dbl>, cg26983017 <dbl>, cg00084271 <dbl>, cg16858433 <dbl>, cg06371647 <dbl>, cg26846609 <dbl>, cg15184869 <dbl>, cg13573375 <dbl>, cg04831745 <dbl>, cg22931151 <dbl>, …
dim(output_mean_process)
## [1] 648 251
Selected_median_imp <- head(combined_importance_quantiles,n = Number_fea_input)
print(head(Selected_median_imp))
## Feature 0% 25% 50% 75% 100%
## 275 cg23432430 0.491990101 0.62272860 1.0000000 1.0000000 1.0000000
## 313 PC3 0.000000000 0.16666667 0.5539209 1.0000000 1.0000000
## 1 age.now 0.003392297 0.00416345 0.5000000 0.5289698 0.6349744
## 311 PC1 0.032231598 0.30910796 0.5000000 0.6684105 0.7598925
## 312 PC2 0.333333333 0.38044700 0.4308152 0.5468701 0.8452667
## 104 cg07158503 0.310031430 0.40415432 0.4231865 0.5000000 0.5113692
Selected_median_imp_Name<-Selected_median_imp$Feature
print(head(Selected_median_imp_Name))
## [1] "cg23432430" "PC3" "age.now" "PC1" "PC2" "cg07158503"
df_selected_Median <- processed_dataFrame[,c("DX",Selected_median_imp_Name)]
output_median_feature<-processed_data[,c("DX",Selected_median_imp_Name)]
print(head(df_selected_Median))
## DX cg23432430 PC3 age.now PC1 PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880
## 200223270003_R02C01 CI 0.9482702 -0.014043316 82.4 -0.214185447 0.01470293 0.5777146 0.9124898 0.7483382 0.8454609 0.4190123 0.8691767 0.6828159 0.7116230 0.4795503 0.8438718 0.5101716
## 200223270003_R03C01 CN 0.9455418 0.005055871 78.6 -0.172761185 0.05745834 0.6203543 0.5375751 0.8254434 0.8653044 0.4420256 0.5160770 0.8265195 0.6854539 0.4868342 0.8548886 0.8402106
## 200223270003_R06C01 CN 0.9418716 0.029143653 80.4 -0.003667305 0.08372861 0.6236025 0.5040948 0.8181246 0.2405168 0.4355521 0.9026304 0.5209552 0.7205633 0.4927257 0.7786685 0.8472063
## cg19799454 cg00004073 cg00154902 cg02887598 cg09727210 cg11227702 cg11331837 cg16338321 cg24851651 cg25208881 cg19503462 cg03749159 cg03088219 cg26081710 cg09120722 cg11787167
## 200223270003_R02C01 0.9178930 0.02928535 0.5137741 0.04020908 0.4240111 0.86486075 0.03692842 0.5350242 0.03674702 0.1851956 0.7951675 0.9355921 0.844002862 0.8751040 0.5878977 0.03853894
## 200223270003_R03C01 0.9106247 0.02787198 0.8540746 0.67073881 0.8812928 0.49184121 0.57150125 0.8294062 0.05358297 0.9092286 0.4537684 0.9153921 0.007435243 0.9198212 0.8287506 0.04673831
## 200223270003_R06C01 0.9066551 0.64576463 0.8188126 0.73408417 0.8493743 0.02543724 0.03182862 0.4918708 0.05968923 0.9265502 0.6997359 0.9255807 0.120155222 0.8801892 0.8793344 0.32564508
## cg12543766 cg19471911 cg11540596 cg01921484 cg00415024 cg12689021 cg21757617 cg01128042 cg17002719 cg16715186 cg05234269 cg12421087 cg05064044 cg15184869 cg23517115 cg00819121
## 200223270003_R02C01 0.51028134 0.6334393 0.9238951 0.9098550 0.4299553 0.7706828 0.03652647 0.9113420 0.04939181 0.2742789 0.93848584 0.5647607 0.5672851 0.8622328 0.2151144 0.9207001
## 200223270003_R03C01 0.88741539 0.8437175 0.8926595 0.9093137 0.3999122 0.7449475 0.44299089 0.5328806 0.40466475 0.7946153 0.57461229 0.5399655 0.5358875 0.8996252 0.9131440 0.9281472
## 200223270003_R06C01 0.02818501 0.6127952 0.8820252 0.9204487 0.7465084 0.7872237 0.44725379 0.5222757 0.51428089 0.8124316 0.02467208 0.5400348 0.5273964 0.8688117 0.8328364 0.9327211
## cg11019791 cg04156077 cg01910713 cg16779438 cg25169289 cg03979311 cg14710850 cg00648024 cg25712921 cg27272246 cg18816397 cg18285382 cg08096656 cg15535896 cg13573375 cg20673830
## 200223270003_R02C01 0.8112324 0.7321883 0.8573169 0.8826150 0.1100884 0.86644909 0.8048592 0.51410972 0.2829848 0.8615873 0.5472925 0.3202927 0.9362594 0.3382952 0.8670419 0.2422052
## 200223270003_R03C01 0.7831231 0.6865805 0.8538850 0.5466924 0.7667174 0.06199853 0.8090950 0.40202875 0.6220919 0.8705287 0.4940355 0.2930577 0.9314878 0.9253926 0.1733934 0.6881735
## 200223270003_R06C01 0.4353250 0.8501188 0.8110366 0.8629492 0.2264993 0.72615553 0.8285902 0.05579011 0.6384003 0.8103777 0.5337018 0.8923595 0.4943033 0.3320191 0.8888246 0.2134634
## cg26853071 cg15600437 cg16431720 cg25436480 cg27577781 cg06277607 cg08745107 cg03982462 cg25879395 cg20823859 cg06960717 cg06961873 cg10738648 cg20685672 cg09584650 cg07640670
## 200223270003_R02C01 0.4233820 0.4885353 0.7356099 0.8425160 0.8143535 0.10744587 0.02921338 0.8562777 0.88130864 0.9030711 0.7030978 0.5335591 0.44931577 0.6712101 0.08230254 0.58296513
## 200223270003_R03C01 0.7451354 0.4894487 0.8692449 0.4994032 0.8113185 0.09353494 0.78542320 0.6023731 0.02603438 0.6062985 0.7653402 0.5472606 0.49894016 0.7932091 0.09661586 0.55225610
## 200223270003_R06C01 0.4228079 0.8551374 0.8773137 0.3494312 0.8144274 0.09504696 0.02709928 0.8778458 0.91060615 0.8917348 0.7206218 0.9415177 0.05552024 0.6613646 0.52399749 0.04058533
## cg12702014 cg16858433 cg00512739 cg15098922 cg26679884 cg16536985 cg24883219 cg05876883 cg06371647 cg02823329 cg12556569 cg22666875 cg13387643 cg09216282 cg02078724 cg15700429
## 200223270003_R02C01 0.7704049 0.9184356 0.9337648 0.9286092 0.6793815 0.5789643 0.6430473 0.9039064 0.8336894 0.9462397 0.06218231 0.8177182 0.4229959 0.9349248 0.3096774 0.7879010
## 200223270003_R03C01 0.7848681 0.9194211 0.8863895 0.9027517 0.1848705 0.5418687 0.6822115 0.9223308 0.8198684 0.6464005 0.03924599 0.8291957 0.4200273 0.9244259 0.2896133 0.9114530
## 200223270003_R06C01 0.8065993 0.9271632 0.9242748 0.8525611 0.1701734 0.8392044 0.5296903 0.4697980 0.8069537 0.9633930 0.48636893 0.3694180 0.4161488 0.9263996 0.2805612 0.8838233
## cg17429539 cg08584917 cg01608425 cg08788093 cg22542451 cg00084271 cg21697769 cg05593887 cg18918831 cg08198851 cg22931151 cg18857647 cg18150287 cg00939409 cg01008088 cg17723206
## 200223270003_R02C01 0.7860900 0.5663205 0.9030410 0.03911678 0.5884356 0.8103611 0.8946108 0.5939220 0.4891660 0.6578905 0.9311023 0.8582332 0.7685695 0.2652180 0.8424817 0.92881042
## 200223270003_R03C01 0.7100923 0.9019732 0.9264388 0.60934160 0.8337068 0.7877006 0.2822953 0.5766550 0.5333801 0.6578186 0.9356702 0.8394132 0.7519166 0.8882671 0.2417656 0.48556255
## 200223270003_R06C01 0.7660838 0.9187789 0.8887753 0.88380243 0.8125084 0.7706165 0.8698740 0.9148338 0.6406575 0.1272153 0.9328614 0.2647491 0.2501173 0.8842646 0.2618620 0.01765023
## cg05321907 cg12776173 cg02932958 cg09247979 cg14170504 cg25306893 cg25758034 cg25649515 cg22305850 cg13405878 cg14687298 cg12240569 cg19301366 cg05161773 cg11133939 cg01933473
## 200223270003_R02C01 0.2880477 0.1038804 0.7901008 0.5070956 0.54915621 0.6265392 0.6114028 0.9279829 0.03361934 0.4549662 0.04206702 0.82772064 0.8831393 0.4120912 0.1282694 0.2589014
## 200223270003_R03C01 0.1782629 0.8730635 0.4210489 0.5706177 0.02236650 0.8330282 0.6649219 0.9235753 0.57522232 0.7858042 0.14813581 0.02690547 0.8072679 0.4154907 0.5920898 0.6726133
## 200223270003_R06C01 0.8427929 0.7009491 0.3825995 0.5090215 0.02988245 0.6175380 0.2393844 0.5895839 0.58548744 0.7583938 0.24260002 0.46030640 0.8796022 0.8526849 0.5127706 0.2642560
## cg26983017 cg24697433 cg18993517 cg02122327 cg11706829 cg17906851 cg17386240 cg15633912 cg16571124 cg03549208 cg02495179 cg06880438 cg10681981 cg13739190 cg09785377 cg11438323
## 200223270003_R02C01 0.89868232 0.9243095 0.2091538 0.38940091 0.8897234 0.9488392 0.7473400 0.1605530 0.9282854 0.9014487 0.6813307 0.8285145 0.7035090 0.8510103 0.9162088 0.4863471
## 200223270003_R03C01 0.03145466 0.6808390 0.2665896 0.37769608 0.5444785 0.9529718 0.7144809 0.9333421 0.9206431 0.8381784 0.7373055 0.7988881 0.7382662 0.8358482 0.9226292 0.8984559
## 200223270003_R06C01 0.84677625 0.6384606 0.2574003 0.04017909 0.5669449 0.6462151 0.8074824 0.8737362 0.9276842 0.9097817 0.5588114 0.7839538 0.6971989 0.8419471 0.6405193 0.8722772
## cg22071943 cg26846609 cg24634455 cg01280698 cg06833284 cg02668233 cg04831745 cg00322003 cg01662749 cg24307368 cg04497611 cg00146240 cg00696044 cg02627240 cg03672288 cg03737947
## 200223270003_R02C01 0.8705217 0.48860949 0.7796391 0.8985067 0.9125144 0.4708431 0.61984995 0.1759911 0.3506201 0.64323677 0.9086359 0.6336151 0.55608424 0.66706843 0.9235592 0.91824910
## 200223270003_R03C01 0.2442648 0.04878986 0.5188241 0.8846201 0.9003482 0.8841930 0.71214149 0.5702070 0.2510946 0.34980461 0.8818513 0.8957183 0.07552381 0.57129408 0.6718625 0.92067153
## 200223270003_R06C01 0.2644581 0.48026945 0.5325725 0.8847132 0.6097933 0.4575646 0.06871768 0.3077122 0.8061480 0.02720398 0.5853116 0.1433218 0.79270858 0.05309659 0.9007629 0.03638091
## cg04316537 cg06118351 cg06403901 cg06483046 cg06864789 cg07138269 cg08554146 cg08857872 cg10240127 cg11187460 cg11286989 cg11314779 cg12228670 cg13372276 cg13653328 cg14293999
## 200223270003_R02C01 0.8074830 0.3633940 0.92790690 0.04383925 0.05369415 0.5002290 0.8982080 0.3395280 0.9250553 0.03672179 0.7590008 0.0242134 0.8632174 0.04888111 0.9245434 0.2836710
## 200223270003_R03C01 0.8453340 0.4714860 0.04783341 0.50720277 0.46053125 0.9426707 0.8963074 0.8181845 0.9403255 0.92516409 0.8533989 0.8966100 0.8496212 0.62396373 0.5122938 0.9172023
## 200223270003_R06C01 0.4351695 0.8655962 0.05253626 0.89604910 0.87513655 0.5057781 0.8213878 0.2970779 0.9056974 0.03109553 0.7313884 0.8908661 0.8738949 0.59693465 0.9362798 0.9168166
## cg14532717 cg14780448 cg15730644 cg15985500 cg17002338 cg17042243 cg17738613 cg18819889 cg18949721 cg21986118 cg23066280 cg23916408 cg24139837 cg25277809 cg27160885 cg05392160
## 200223270003_R02C01 0.5732280 0.9119141 0.4803181 0.8555262 0.9286251 0.2502905 0.6879612 0.9156157 0.2334245 0.6658175 0.07247841 0.1942275 0.07404605 0.1632342 0.2231606 0.9328933
## 200223270003_R03C01 0.1107638 0.6702102 0.4353906 0.8312198 0.2684163 0.2933475 0.6582258 0.9004455 0.2437792 0.6571296 0.57174588 0.9154993 0.04183445 0.4913711 0.8263885 0.2576881
## 200223270003_R06C01 0.6273416 0.6207355 0.8763048 0.8492103 0.2811103 0.2725457 0.1022257 0.9054439 0.2523095 0.7034445 0.80814756 0.8886255 0.05657120 0.5952124 0.2121179 0.8920726
## cg02631626 cg23352245 cg21139150 cg04124201 cg10666341 cg18339359 cg22169467 cg04888234 cg25059696 cg06715136 cg03600007 cg10091792 cg14192979 cg20078646 cg27224751 cg04412904
## 200223270003_R02C01 0.6280766 0.9377232 0.01853264 0.8686421 0.9046648 0.8824858 0.3095010 0.8379655 0.9017504 0.3400192 0.5658487 0.8670733 0.06336040 0.06198170 0.44503947 0.05088595
## 200223270003_R03C01 0.1951736 0.9375774 0.43223243 0.3308589 0.6731062 0.9040272 0.2978585 0.4376314 0.3047156 0.9259109 0.6018832 0.5864221 0.06019651 0.89537412 0.03214912 0.07717659
## 200223270003_R06C01 0.2699849 0.5932742 0.43772680 0.3241613 0.6443180 0.8552121 0.8955853 0.8039047 0.3051179 0.9079807 0.8611166 0.6087997 0.52114282 0.08725521 0.83123722 0.08253743
## cg17129965 cg14507637 cg14307563 cg20981163 cg22535849 cg18029737 cg14627380 cg10788927 cg08041188 cg13226272 cg11247378 cg02772171 cg04462915 cg03221390 cg22112152 cg04664583
## 200223270003_R02C01 0.8972140 0.9051258 0.1855966 0.8990628 0.8847704 0.9100454 0.9455369 0.8973154 0.7752456 0.02637249 0.1591185 0.9182018 0.03224861 0.5859063 0.8476101 0.5572814
## 200223270003_R03C01 0.8806673 0.9009460 0.8916957 0.9264076 0.8609966 0.9016634 0.9258964 0.2021398 0.3201255 0.54100016 0.7874849 0.5660559 0.50740695 0.9180706 0.8014136 0.5881190
## 200223270003_R06C01 0.8857237 0.9013686 0.8750052 0.4874651 0.8808022 0.7376586 0.5789898 0.2053075 0.7900939 0.44370701 0.4807942 0.8995479 0.02700644 0.6399867 0.7897897 0.9352717
## cg20803293 cg09451339 cg16733676 cg22741595 cg04242342 cg00295418 cg06012903 cg00345083 cg10039445 cg13368637 cg04718469 cg16089727 cg06231502 cg02550738 cg05850457 cg08896901
## 200223270003_R02C01 0.54933918 0.2243746 0.9057228 0.6525533 0.8206769 0.44954665 0.7964595 0.47960968 0.8833873 0.5597507 0.8687522 0.86748697 0.7784451 0.6201457 0.8183013 0.3581911
## 200223270003_R03C01 0.07935747 0.2340702 0.8904541 0.1730013 0.8167892 0.48471295 0.1933431 0.50833875 0.8954055 0.9100088 0.7256813 0.54996692 0.7964278 0.9011727 0.8313023 0.2467071
## 200223270003_R06C01 0.42191244 0.8921284 0.1698111 0.1550739 0.8040357 0.02004532 0.1960773 0.03929249 0.8832807 0.8739205 0.8521881 0.05876736 0.7706160 0.9085849 0.8161364 0.9225209
## cg17268094 cg01549082 cg12146221 cg06394820 cg26901661 cg12784167 cg13815695 cg01462799 cg00322820 cg02356645
## 200223270003_R02C01 0.5774753 0.2924138 0.2049284 0.8513195 0.8951971 0.81503498 0.9267057 0.8284427 0.4869764 0.5105903
## 200223270003_R03C01 0.9003262 0.7065693 0.1814927 0.8695521 0.8754981 0.02811410 0.6859729 0.4038824 0.4858988 0.5833923
## 200223270003_R06C01 0.8789368 0.2895440 0.8619250 0.4415020 0.9021064 0.03073269 0.6509046 0.4676821 0.4754313 0.5701428
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Median)
## [1] 648 251
print(Selected_median_imp_Name)
## [1] "cg23432430" "PC3" "age.now" "PC1" "PC2" "cg07158503" "cg00962106" "cg07634717" "cg06697310" "cg14168080" "cg03660162" "cg02225060" "cg07504457" "cg10701746" "cg20678988"
## [16] "cg09015880" "cg19799454" "cg00004073" "cg00154902" "cg02887598" "cg09727210" "cg11227702" "cg11331837" "cg16338321" "cg24851651" "cg25208881" "cg19503462" "cg03749159" "cg03088219" "cg26081710"
## [31] "cg09120722" "cg11787167" "cg12543766" "cg19471911" "cg11540596" "cg01921484" "cg00415024" "cg12689021" "cg21757617" "cg01128042" "cg17002719" "cg16715186" "cg05234269" "cg12421087" "cg05064044"
## [46] "cg15184869" "cg23517115" "cg00819121" "cg11019791" "cg04156077" "cg01910713" "cg16779438" "cg25169289" "cg03979311" "cg14710850" "cg00648024" "cg25712921" "cg27272246" "cg18816397" "cg18285382"
## [61] "cg08096656" "cg15535896" "cg13573375" "cg20673830" "cg26853071" "cg15600437" "cg16431720" "cg25436480" "cg27577781" "cg06277607" "cg08745107" "cg03982462" "cg25879395" "cg20823859" "cg06960717"
## [76] "cg06961873" "cg10738648" "cg20685672" "cg09584650" "cg07640670" "cg12702014" "cg16858433" "cg00512739" "cg15098922" "cg26679884" "cg16536985" "cg24883219" "cg05876883" "cg06371647" "cg02823329"
## [91] "cg12556569" "cg22666875" "cg13387643" "cg09216282" "cg02078724" "cg15700429" "cg17429539" "cg08584917" "cg01608425" "cg08788093" "cg22542451" "cg00084271" "cg21697769" "cg05593887" "cg18918831"
## [106] "cg08198851" "cg22931151" "cg18857647" "cg18150287" "cg00939409" "cg01008088" "cg17723206" "cg05321907" "cg12776173" "cg02932958" "cg09247979" "cg14170504" "cg25306893" "cg25758034" "cg25649515"
## [121] "cg22305850" "cg13405878" "cg14687298" "cg12240569" "cg19301366" "cg05161773" "cg11133939" "cg01933473" "cg26983017" "cg24697433" "cg18993517" "cg02122327" "cg11706829" "cg17906851" "cg17386240"
## [136] "cg15633912" "cg16571124" "cg03549208" "cg02495179" "cg06880438" "cg10681981" "cg13739190" "cg09785377" "cg11438323" "cg22071943" "cg26846609" "cg24634455" "cg01280698" "cg06833284" "cg02668233"
## [151] "cg04831745" "cg00322003" "cg01662749" "cg24307368" "cg04497611" "cg00146240" "cg00696044" "cg02627240" "cg03672288" "cg03737947" "cg04316537" "cg06118351" "cg06403901" "cg06483046" "cg06864789"
## [166] "cg07138269" "cg08554146" "cg08857872" "cg10240127" "cg11187460" "cg11286989" "cg11314779" "cg12228670" "cg13372276" "cg13653328" "cg14293999" "cg14532717" "cg14780448" "cg15730644" "cg15985500"
## [181] "cg17002338" "cg17042243" "cg17738613" "cg18819889" "cg18949721" "cg21986118" "cg23066280" "cg23916408" "cg24139837" "cg25277809" "cg27160885" "cg05392160" "cg02631626" "cg23352245" "cg21139150"
## [196] "cg04124201" "cg10666341" "cg18339359" "cg22169467" "cg04888234" "cg25059696" "cg06715136" "cg03600007" "cg10091792" "cg14192979" "cg20078646" "cg27224751" "cg04412904" "cg17129965" "cg14507637"
## [211] "cg14307563" "cg20981163" "cg22535849" "cg18029737" "cg14627380" "cg10788927" "cg08041188" "cg13226272" "cg11247378" "cg02772171" "cg04462915" "cg03221390" "cg22112152" "cg04664583" "cg20803293"
## [226] "cg09451339" "cg16733676" "cg22741595" "cg04242342" "cg00295418" "cg06012903" "cg00345083" "cg10039445" "cg13368637" "cg04718469" "cg16089727" "cg06231502" "cg02550738" "cg05850457" "cg08896901"
## [241] "cg17268094" "cg01549082" "cg12146221" "cg06394820" "cg26901661" "cg12784167" "cg13815695" "cg01462799" "cg00322820" "cg02356645"
print(head(output_median_feature))
## # A tibble: 6 × 251
## DX cg23432430 PC3 age.now PC1 PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880 cg19799454 cg00004073
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CI 0.948 -0.0140 82.4 -0.214 0.0147 0.578 0.912 0.748 0.845 0.419 0.869 0.683 0.712 0.480 0.844 0.510 0.918 0.0293
## 2 CN 0.946 0.00506 78.6 -0.173 0.0575 0.620 0.538 0.825 0.865 0.442 0.516 0.827 0.685 0.487 0.855 0.840 0.911 0.0279
## 3 CN 0.942 0.0291 80.4 -0.00367 0.0837 0.624 0.504 0.818 0.241 0.436 0.903 0.521 0.721 0.493 0.779 0.847 0.907 0.646
## 4 CI 0.943 -0.0323 78.2 -0.187 -0.0112 0.599 0.904 0.758 0.848 0.957 0.531 0.808 0.187 0.855 0.826 0.487 0.922 0.624
## 5 CI 0.946 0.0529 62.9 0.0268 0.0000165 0.631 0.896 0.826 0.821 0.946 0.926 0.608 0.235 0.488 0.330 0.889 0.914 0.412
## 6 CN 0.951 -0.00869 80.7 -0.0379 0.0157 0.615 0.886 0.210 0.784 0.399 0.894 0.764 0.730 0.842 0.854 0.906 0.921 0.393
## # ℹ 232 more variables: cg00154902 <dbl>, cg02887598 <dbl>, cg09727210 <dbl>, cg11227702 <dbl>, cg11331837 <dbl>, cg16338321 <dbl>, cg24851651 <dbl>, cg25208881 <dbl>, cg19503462 <dbl>,
## # cg03749159 <dbl>, cg03088219 <dbl>, cg26081710 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>, cg12543766 <dbl>, cg19471911 <dbl>, cg11540596 <dbl>, cg01921484 <dbl>, cg00415024 <dbl>,
## # cg12689021 <dbl>, cg21757617 <dbl>, cg01128042 <dbl>, cg17002719 <dbl>, cg16715186 <dbl>, cg05234269 <dbl>, cg12421087 <dbl>, cg05064044 <dbl>, cg15184869 <dbl>, cg23517115 <dbl>,
## # cg00819121 <dbl>, cg11019791 <dbl>, cg04156077 <dbl>, cg01910713 <dbl>, cg16779438 <dbl>, cg25169289 <dbl>, cg03979311 <dbl>, cg14710850 <dbl>, cg00648024 <dbl>, cg25712921 <dbl>,
## # cg27272246 <dbl>, cg18816397 <dbl>, cg18285382 <dbl>, cg08096656 <dbl>, cg15535896 <dbl>, cg13573375 <dbl>, cg20673830 <dbl>, cg26853071 <dbl>, cg15600437 <dbl>, cg16431720 <dbl>,
## # cg25436480 <dbl>, cg27577781 <dbl>, cg06277607 <dbl>, cg08745107 <dbl>, cg03982462 <dbl>, cg25879395 <dbl>, cg20823859 <dbl>, cg06960717 <dbl>, cg06961873 <dbl>, cg10738648 <dbl>,
## # cg20685672 <dbl>, cg09584650 <dbl>, cg07640670 <dbl>, cg12702014 <dbl>, cg16858433 <dbl>, cg00512739 <dbl>, cg15098922 <dbl>, cg26679884 <dbl>, cg16536985 <dbl>, cg24883219 <dbl>, …
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- Number_fea_input
combined_importance_freq_ordered_df <- combined_importance_Avg_ordered
df_Selected_Frequency_Imp <- function(n_select_frequencyWay,FeatureImportanceTable){
# In this function, we Input the feature importance data frame,
# And process with the steps we discussed before.
# The output will be the feature frequency Table.
# (i.e. frequency of the appearance of each features based on the Top Number of features selected)
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature
# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature
# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature
# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature
# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))
models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models),
dimnames = list(all_features, models))
# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
feature_matrix[feature, "LRM"] <-
as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
feature_matrix[feature, "XGB"] <-
as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
feature_matrix[feature, "ENM"] <-
as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
feature_matrix[feature, "RF"] <-
as.integer(feature %in% top_impAvg_orderby_RF_NAME)
feature_matrix[feature, "SVM"] <-
as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}
# Convert the matrix to a data frame
feature_df <- as.data.frame(feature_matrix)
feature_df$Total_Count <- rowSums(feature_df[,1:5])
# Sort the dataframe by the Total_Count in descending order
feature_df <- feature_df[order(-feature_df$Total_Count), ]
print(feature_df)
return(feature_df)
}
Now, the function will be tested below:
df_Func_test<-df_Selected_Frequency_Imp(NUM_COMMON_FEATURES_SET_Frequency,combined_importance_freq_ordered_df)
## LRM XGB ENM RF SVM Total_Count
## cg23432430 1 1 1 1 1 5
## PC2 1 1 1 1 0 4
## cg07158503 1 1 1 0 1 4
## PC3 1 0 1 1 0 3
## PC1 1 0 1 0 1 3
## cg00962106 1 0 1 0 1 3
## cg06697310 1 0 1 1 0 3
## cg26081710 1 0 1 0 1 3
## cg09727210 1 0 1 0 0 2
## cg02225060 1 0 1 0 0 2
## cg09015880 1 0 1 0 0 2
## cg16338321 1 0 1 0 0 2
## cg00819121 1 0 1 0 0 2
## cg00415024 1 0 0 1 0 2
## cg21757617 1 0 1 0 0 2
## cg14168080 1 1 0 0 0 2
## cg02887598 1 0 1 0 0 2
## cg05064044 1 0 1 0 0 2
## cg03660162 0 1 0 0 1 2
## cg07634717 0 1 0 0 1 2
## cg19799454 0 1 0 0 1 2
## cg20678988 0 1 0 0 1 2
## cg11019791 0 1 0 1 0 2
## cg10701746 1 0 0 0 0 1
## cg01910713 1 0 0 0 0 1
## age.now 0 1 0 0 0 1
## cg11438323 0 1 0 0 0 1
## cg11540596 0 1 0 0 0 1
## cg17002719 0 1 0 0 0 1
## cg09120722 0 1 0 0 0 1
## cg17002338 0 1 0 0 0 1
## cg11227702 0 1 0 0 0 1
## cg18816397 0 1 0 0 0 1
## cg02122327 0 1 0 0 0 1
## cg13573375 0 1 0 0 0 1
## cg03088219 0 1 0 0 0 1
## cg06277607 0 0 1 0 0 1
## cg27272246 0 0 1 0 0 1
## cg00004073 0 0 1 0 0 1
## cg17429539 0 0 1 0 0 1
## cg03749159 0 0 0 1 0 1
## cg11331837 0 0 0 1 0 1
## cg21697769 0 0 0 1 0 1
## cg01008088 0 0 0 1 0 1
## cg04768387 0 0 0 1 0 1
## cg16431720 0 0 0 1 0 1
## cg12784167 0 0 0 1 0 1
## cg23159970 0 0 0 1 0 1
## cg24851651 0 0 0 1 0 1
## cg17042243 0 0 0 1 0 1
## cg09451339 0 0 0 1 0 1
## cg17386240 0 0 0 1 0 1
## cg04109990 0 0 0 1 0 1
## cg14192979 0 0 0 1 0 1
## cg12333628 0 0 0 0 1 1
## cg20685672 0 0 0 0 1 1
## cg26853071 0 0 0 0 1 1
## cg24883219 0 0 0 0 1 1
## cg06833284 0 0 0 0 1 1
## cg03600007 0 0 0 0 1 1
## cg01280698 0 0 0 0 1 1
## cg13226272 0 0 0 0 1 1
## cg15775217 0 0 0 0 1 1
## cg04156077 0 0 0 0 1 1
## cg19503462 0 0 0 0 1 1
# The expected output should be zero.
sum(df_Func_test!=frequency_feature_df_RAW_ordered)
## [1] 0
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- Number_fea_input
df_feature_Output_frequency <- df_Selected_Frequency_Imp(Number_fea_input,
combined_importance_freq_ordered_df)
## LRM XGB ENM RF SVM Total_Count
## PC1 1 1 1 1 1 5
## cg23432430 1 1 1 1 1 5
## cg09727210 1 1 1 1 1 5
## PC2 1 1 1 1 1 5
## cg00962106 1 1 1 1 1 5
## cg07158503 1 1 1 1 1 5
## cg06697310 1 1 1 1 1 5
## cg02225060 1 1 1 1 1 5
## cg09015880 1 1 1 1 1 5
## cg10701746 1 1 1 1 1 5
## cg16338321 1 1 1 1 1 5
## cg26081710 1 1 1 1 1 5
## cg00415024 1 1 1 1 1 5
## cg21757617 1 1 1 1 1 5
## cg14168080 1 1 1 1 1 5
## cg02887598 1 1 1 1 1 5
## cg05064044 1 1 1 1 1 5
## cg01910713 1 1 1 1 1 5
## cg11331837 1 1 1 1 1 5
## cg07504457 1 1 1 1 1 5
## cg00004073 1 1 1 1 1 5
## cg04156077 1 1 1 1 1 5
## cg10738648 1 1 1 1 1 5
## cg07640670 1 1 1 1 1 5
## cg16858433 1 1 1 1 1 5
## cg12543766 1 1 1 1 1 5
## cg20685672 1 1 1 1 1 5
## cg24851651 1 1 1 1 1 5
## cg20678988 1 1 1 1 1 5
## cg03088219 1 1 1 1 1 5
## cg16536985 1 1 1 1 1 5
## cg05234269 1 1 1 1 1 5
## cg18285382 1 1 1 1 1 5
## cg09216282 1 1 1 1 1 5
## cg00084271 1 1 1 1 1 5
## cg21697769 1 1 1 1 1 5
## cg15098922 1 1 1 1 1 5
## cg27577781 1 1 1 1 1 5
## cg18150287 1 1 1 1 1 5
## cg08096656 1 1 1 1 1 5
## cg19503462 1 1 1 1 1 5
## cg07634717 1 1 1 1 1 5
## cg26853071 1 1 1 1 1 5
## cg09247979 1 1 1 1 1 5
## cg00154902 1 1 1 1 1 5
## cg15184869 1 1 1 1 1 5
## cg19471911 1 1 1 1 1 5
## cg12702014 1 1 1 1 1 5
## cg03979311 1 1 1 1 1 5
## cg11787167 1 1 1 1 1 5
## cg18857647 1 1 1 1 1 5
## cg11540596 1 1 1 1 1 5
## cg25712921 1 1 1 1 1 5
## cg12240569 1 1 1 1 1 5
## cg19301366 1 1 1 1 1 5
## cg25436480 1 1 1 1 1 5
## cg13387643 1 1 1 1 1 5
## cg12421087 1 1 1 1 1 5
## cg11227702 1 1 1 1 1 5
## cg00648024 1 1 1 1 1 5
## cg17002719 1 1 1 1 1 5
## cg15633912 1 1 1 1 1 5
## cg16715186 1 1 1 1 1 5
## cg11019791 1 1 1 1 1 5
## cg06880438 1 1 1 1 1 5
## cg03660162 1 1 1 1 1 5
## cg01008088 1 1 1 1 1 5
## cg15535896 1 1 1 1 1 5
## cg15600437 1 1 1 1 1 5
## cg02078724 1 1 1 1 1 5
## cg20823859 1 1 1 1 1 5
## cg13372276 1 1 1 1 1 5
## cg25208881 1 1 1 1 1 5
## cg26679884 1 1 1 1 1 5
## cg01921484 1 1 1 1 1 5
## cg06960717 1 1 1 1 1 5
## cg25169289 1 1 1 1 1 5
## cg08584917 1 1 1 1 1 5
## cg22305850 1 1 1 1 1 5
## cg11133939 1 1 1 1 1 5
## cg01608425 1 1 1 1 1 5
## cg06371647 1 1 1 1 1 5
## cg03749159 1 1 1 1 1 5
## cg24697433 1 1 1 1 1 5
## cg21986118 1 1 1 1 1 5
## cg18816397 1 1 1 1 1 5
## cg01128042 1 1 1 1 1 5
## cg15700429 1 1 1 1 1 5
## cg25277809 1 1 1 1 1 5
## cg22931151 1 1 1 1 1 5
## cg24634455 1 1 1 1 1 5
## cg13405878 1 1 1 1 1 5
## cg02932958 1 1 1 1 1 5
## cg11286989 1 1 1 1 1 5
## cg05593887 1 1 1 1 1 5
## cg18918831 1 1 1 1 1 5
## cg11247378 1 1 1 1 1 5
## cg24139837 1 1 1 1 1 5
## cg17042243 1 1 1 1 1 5
## cg25879395 1 1 1 1 1 5
## cg18029737 1 1 1 1 1 5
## cg10681981 1 1 1 1 1 5
## cg26846609 1 1 1 1 1 5
## cg14293999 1 1 1 1 1 5
## cg10240127 1 1 1 1 1 5
## cg08198851 1 1 1 1 1 5
## cg18993517 1 1 1 1 1 5
## cg02823329 1 1 1 1 1 5
## cg08745107 1 1 1 1 1 5
## cg13573375 1 1 1 1 1 5
## cg17738613 1 1 1 1 1 5
## cg02356645 1 1 1 1 1 5
## cg05876883 1 1 1 1 1 5
## cg24883219 1 1 1 1 1 5
## cg00696044 1 1 1 1 1 5
## cg17131279 1 1 1 1 1 5
## cg08041188 1 1 1 1 1 5
## cg24307368 1 1 1 1 1 5
## cg06961873 1 1 1 1 1 5
## cg05392160 1 1 1 1 1 5
## cg26983017 1 1 1 1 1 5
## cg07138269 1 1 1 1 1 5
## cg04316537 1 1 1 1 1 5
## cg27224751 1 1 1 1 1 5
## cg04831745 1 1 1 1 1 5
## cg12556569 1 1 1 1 1 5
## cg17386240 1 1 1 1 1 5
## cg04412904 1 1 1 1 1 5
## cg00345083 1 1 1 1 1 5
## cg02668233 1 1 1 1 1 5
## cg10788927 1 1 1 1 1 5
## cg14687298 1 1 1 1 1 5
## cg14170504 1 1 1 1 1 5
## cg03672288 1 1 1 1 1 5
## cg14307563 1 1 1 1 1 5
## cg09451339 1 1 1 1 1 5
## cg16431720 1 1 1 1 1 5
## cg01662749 1 1 1 1 1 5
## cg02495179 1 1 1 1 1 5
## cg04768387 1 1 1 1 1 5
## cg17002338 1 1 1 1 1 5
## cg01933473 1 1 1 1 1 5
## cg16089727 1 1 1 1 1 5
## cg24643105 1 1 1 1 1 5
## PC3 1 0 1 1 1 4
## cg00819121 1 0 1 1 1 4
## cg09120722 1 1 1 0 1 4
## cg27272246 1 1 1 0 1 4
## cg06277607 1 1 1 1 0 4
## cg03982462 1 0 1 1 1 4
## cg09584650 1 1 1 1 0 4
## cg08788093 1 1 1 0 1 4
## cg22666875 1 1 1 1 0 4
## cg22542451 1 1 1 0 1 4
## cg00939409 1 1 1 0 1 4
## cg17723206 1 1 1 0 1 4
## cg05321907 1 0 1 1 1 4
## cg12776173 1 0 1 1 1 4
## cg25758034 1 1 1 0 1 4
## cg14710850 1 0 1 1 1 4
## cg23517115 1 0 1 1 1 4
## cg17429539 1 1 1 0 1 4
## cg17906851 1 1 1 1 0 4
## cg00512739 1 0 1 1 1 4
## cg12689021 1 0 1 1 1 4
## cg16571124 1 1 1 0 1 4
## [ reached 'max' / getOption("max.print") -- omitted 145 rows ]
all_out_features <- union(combined_importance_freq_ordered_df$Feature, rownames(df_feature_Output_frequency))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_output_df_full <- data.frame(Feature = all_out_features)
feature_output_df_full <- merge(feature_output_df_full, df_feature_Output_frequency, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_output_df_full[is.na(feature_output_df_full)] <- 0
# For top_impAvg_ordered
all_output_impAvg_ordered_full <- data.frame(Feature = all_out_features)
all_output_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,
all_output_impAvg_ordered_full,
by.x = "Feature",
by.y = "Feature",
all.x = TRUE)
all_output_impAvg_ordered_full[is.na(all_output_impAvg_ordered_full)] <- 0
all_Output_combined_df_impAvg <- merge(feature_output_df_full,
all_output_impAvg_ordered_full,
by = "Feature",
all = TRUE)
print(head(feature_output_df_full))
## Feature LRM XGB ENM RF SVM Total_Count
## 1 age.now 0 1 0 1 1 3
## 2 cg00004073 1 1 1 1 1 5
## 3 cg00084271 1 1 1 1 1 5
## 4 cg00086247 0 1 0 1 0 2
## 5 cg00146240 1 0 1 1 1 4
## 6 cg00154902 1 1 1 1 1 5
print(head(all_output_impAvg_ordered_full))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0.00416345 0.63497444 0.003392297 0.5289698 0.5000000 0.3343000
## 2 cg00004073 0.26934952 0.25232114 0.368743031 0.3932005 0.3333333 0.3233895
## 3 cg00084271 0.22358222 0.08451443 0.272790932 0.5066986 0.1666667 0.2508506
## 4 cg00086247 0.00000000 0.15625070 0.068094153 0.2757605 0.0000000 0.1000211
## 5 cg00146240 0.08729337 0.00000000 0.195466203 0.5233594 0.1666667 0.1945571
## 6 cg00154902 0.20586269 0.35714894 0.223748437 0.4724526 0.3333333 0.3185092
print(head(all_Output_combined_df_impAvg))
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0 1 0 1 1 3 0.00416345 0.63497444 0.003392297 0.5289698 0.5000000 0.3343000
## 2 cg00004073 1 1 1 1 1 5 0.26934952 0.25232114 0.368743031 0.3932005 0.3333333 0.3233895
## 3 cg00084271 1 1 1 1 1 5 0.22358222 0.08451443 0.272790932 0.5066986 0.1666667 0.2508506
## 4 cg00086247 0 1 0 1 0 2 0.00000000 0.15625070 0.068094153 0.2757605 0.0000000 0.1000211
## 5 cg00146240 1 0 1 1 1 4 0.08729337 0.00000000 0.195466203 0.5233594 0.1666667 0.1945571
## 6 cg00154902 1 1 1 1 1 5 0.20586269 0.35714894 0.223748437 0.4724526 0.3333333 0.3185092
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.
if(METHOD_FEATURE_FLAG == 6){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m6_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m6[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 5){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m5_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m5[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 4){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m4_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m4[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==3){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m3_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m3[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
## # A tibble: 6 × 272
## DX PC1 cg23432430 cg09727210 PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080 cg02887598 cg05064044
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CI -0.214 0.948 0.424 0.0147 0.912 0.578 0.845 0.683 0.510 0.480 0.535 0.875 0.430 0.0365 0.419 0.0402 0.567
## 2 CN -0.173 0.946 0.881 0.0575 0.538 0.620 0.865 0.827 0.840 0.487 0.829 0.920 0.400 0.443 0.442 0.671 0.536
## 3 CN -0.00367 0.942 0.849 0.0837 0.504 0.624 0.241 0.521 0.847 0.493 0.492 0.880 0.747 0.447 0.436 0.734 0.527
## 4 CI -0.187 0.943 0.842 -0.0112 0.904 0.599 0.848 0.808 0.487 0.855 0.525 0.915 0.770 0.434 0.957 0.864 0.628
## 5 CI 0.0268 0.946 0.425 0.0000165 0.896 0.631 0.821 0.608 0.889 0.488 0.842 0.917 0.742 0.747 0.946 0.836 0.566
## 6 CN -0.0379 0.951 0.460 0.0157 0.886 0.615 0.784 0.764 0.906 0.842 0.842 0.923 0.761 0.774 0.399 0.412 0.0830
## # ℹ 254 more variables: cg01910713 <dbl>, cg11331837 <dbl>, cg07504457 <dbl>, cg00004073 <dbl>, cg04156077 <dbl>, cg10738648 <dbl>, cg07640670 <dbl>, cg16858433 <dbl>, cg12543766 <dbl>,
## # cg20685672 <dbl>, cg24851651 <dbl>, cg20678988 <dbl>, cg03088219 <dbl>, cg16536985 <dbl>, cg05234269 <dbl>, cg18285382 <dbl>, cg09216282 <dbl>, cg00084271 <dbl>, cg21697769 <dbl>,
## # cg15098922 <dbl>, cg27577781 <dbl>, cg18150287 <dbl>, cg08096656 <dbl>, cg19503462 <dbl>, cg07634717 <dbl>, cg26853071 <dbl>, cg09247979 <dbl>, cg00154902 <dbl>, cg15184869 <dbl>,
## # cg19471911 <dbl>, cg12702014 <dbl>, cg03979311 <dbl>, cg11787167 <dbl>, cg18857647 <dbl>, cg11540596 <dbl>, cg25712921 <dbl>, cg12240569 <dbl>, cg19301366 <dbl>, cg25436480 <dbl>,
## # cg13387643 <dbl>, cg12421087 <dbl>, cg11227702 <dbl>, cg00648024 <dbl>, cg17002719 <dbl>, cg15633912 <dbl>, cg16715186 <dbl>, cg11019791 <dbl>, cg06880438 <dbl>, cg03660162 <dbl>,
## # cg01008088 <dbl>, cg15535896 <dbl>, cg15600437 <dbl>, cg02078724 <dbl>, cg20823859 <dbl>, cg13372276 <dbl>, cg25208881 <dbl>, cg26679884 <dbl>, cg01921484 <dbl>, cg06960717 <dbl>,
## # cg25169289 <dbl>, cg08584917 <dbl>, cg22305850 <dbl>, cg11133939 <dbl>, cg01608425 <dbl>, cg06371647 <dbl>, cg03749159 <dbl>, cg24697433 <dbl>, cg21986118 <dbl>, cg18816397 <dbl>, …
## [1] "The number of final used features of common importance method: 271"
## [1] "PC1" "cg23432430" "cg09727210" "PC2" "cg00962106" "cg07158503" "cg06697310" "cg02225060" "cg09015880" "cg10701746" "cg16338321" "cg26081710" "cg00415024" "cg21757617" "cg14168080"
## [16] "cg02887598" "cg05064044" "cg01910713" "cg11331837" "cg07504457" "cg00004073" "cg04156077" "cg10738648" "cg07640670" "cg16858433" "cg12543766" "cg20685672" "cg24851651" "cg20678988" "cg03088219"
## [31] "cg16536985" "cg05234269" "cg18285382" "cg09216282" "cg00084271" "cg21697769" "cg15098922" "cg27577781" "cg18150287" "cg08096656" "cg19503462" "cg07634717" "cg26853071" "cg09247979" "cg00154902"
## [46] "cg15184869" "cg19471911" "cg12702014" "cg03979311" "cg11787167" "cg18857647" "cg11540596" "cg25712921" "cg12240569" "cg19301366" "cg25436480" "cg13387643" "cg12421087" "cg11227702" "cg00648024"
## [61] "cg17002719" "cg15633912" "cg16715186" "cg11019791" "cg06880438" "cg03660162" "cg01008088" "cg15535896" "cg15600437" "cg02078724" "cg20823859" "cg13372276" "cg25208881" "cg26679884" "cg01921484"
## [76] "cg06960717" "cg25169289" "cg08584917" "cg22305850" "cg11133939" "cg01608425" "cg06371647" "cg03749159" "cg24697433" "cg21986118" "cg18816397" "cg01128042" "cg15700429" "cg25277809" "cg22931151"
## [91] "cg24634455" "cg13405878" "cg02932958" "cg11286989" "cg05593887" "cg18918831" "cg11247378" "cg24139837" "cg17042243" "cg25879395" "cg18029737" "cg10681981" "cg26846609" "cg14293999" "cg10240127"
## [106] "cg08198851" "cg18993517" "cg02823329" "cg08745107" "cg13573375" "cg17738613" "cg02356645" "cg05876883" "cg24883219" "cg00696044" "cg17131279" "cg08041188" "cg24307368" "cg06961873" "cg05392160"
## [121] "cg26983017" "cg07138269" "cg04316537" "cg27224751" "cg04831745" "cg12556569" "cg17386240" "cg04412904" "cg00345083" "cg02668233" "cg10788927" "cg14687298" "cg14170504" "cg03672288" "cg14307563"
## [136] "cg09451339" "cg16431720" "cg01662749" "cg02495179" "cg04768387" "cg17002338" "cg01933473" "cg16089727" "cg24643105" "PC3" "cg00819121" "cg09120722" "cg27272246" "cg06277607" "cg03982462"
## [151] "cg09584650" "cg08788093" "cg22666875" "cg22542451" "cg00939409" "cg17723206" "cg05321907" "cg12776173" "cg25758034" "cg14710850" "cg23517115" "cg17429539" "cg17906851" "cg00512739" "cg12689021"
## [166] "cg16571124" "cg22071943" "cg25649515" "cg04497611" "cg15730644" "cg13739190" "cg25306893" "cg16779438" "cg06483046" "cg14780448" "cg06833284" "cg14507637" "cg18819889" "cg03549208" "cg15985500"
## [181] "cg05161773" "cg06403901" "cg22169467" "cg08857872" "cg11187460" "cg03600007" "cg05850457" "cg06715136" "cg10091792" "cg03221390" "cg02122327" "cg21139150" "cg14192979" "cg23352245" "cg00146240"
## [196] "cg20981163" "cg27160885" "cg00553601" "cg12146221" "cg13226272" "cg22112152" "cg23836570" "cg08554146" "cg09785377" "cg01462799" "cg06118351" "cg17129965" "cg18339359" "cg11438323" "cg00295418"
## [211] "cg08896901" "cg18526121" "cg02550738" "cg04664583" "cg07028768" "cg01549082" "cg13815695" "cg02627240" "cg19799454" "cg06864789" "cg03737947" "cg14532717" "cg22535849" "cg04718469" "cg14627380"
## [226] "cg10039445" "cg02631626" "cg20673830" "cg17268094" "cg11706829" "cg16733676" "cg20078646" "cg13368637" "cg16652920" "cg26901661" "cg04888234" "cg04242342" "cg00322820" "cg23066280" "cg07480955"
## [241] "cg02772171" "cg21243064" "cg21388339" "cg01153376" "cg15775217" "cg02621446" "cg10666341" "cg23177161" "cg02246922" "cg25174111" "cg00322003" "cg15586958" "cg06231502" "age.now" "cg18949721"
## [256] "cg12228670" "cg11314779" "cg23916408" "cg01280698" "cg04124201" "cg12784167" "cg04645024" "cg16202259" "cg11268585" "cg15501526" "cg03084184" "cg12333628" "cg21783012" "cg13038195" "cg04867412"
## [271] "cg20803293"
## DX PC1 cg23432430 cg09727210 PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080
## 200223270003_R02C01 CI -0.214185447 0.9482702 0.4240111 0.01470293 0.9124898 0.5777146 0.8454609 0.6828159 0.5101716 0.4795503 0.5350242 0.8751040 0.4299553 0.03652647 0.4190123
## 200223270003_R03C01 CN -0.172761185 0.9455418 0.8812928 0.05745834 0.5375751 0.6203543 0.8653044 0.8265195 0.8402106 0.4868342 0.8294062 0.9198212 0.3999122 0.44299089 0.4420256
## 200223270003_R06C01 CN -0.003667305 0.9418716 0.8493743 0.08372861 0.5040948 0.6236025 0.2405168 0.5209552 0.8472063 0.4927257 0.4918708 0.8801892 0.7465084 0.44725379 0.4355521
## cg02887598 cg05064044 cg01910713 cg11331837 cg07504457 cg00004073 cg04156077 cg10738648 cg07640670 cg16858433 cg12543766 cg20685672 cg24851651 cg20678988 cg03088219 cg16536985
## 200223270003_R02C01 0.04020908 0.5672851 0.8573169 0.03692842 0.7116230 0.02928535 0.7321883 0.44931577 0.58296513 0.9184356 0.51028134 0.6712101 0.03674702 0.8438718 0.844002862 0.5789643
## 200223270003_R03C01 0.67073881 0.5358875 0.8538850 0.57150125 0.6854539 0.02787198 0.6865805 0.49894016 0.55225610 0.9194211 0.88741539 0.7932091 0.05358297 0.8548886 0.007435243 0.5418687
## 200223270003_R06C01 0.73408417 0.5273964 0.8110366 0.03182862 0.7205633 0.64576463 0.8501188 0.05552024 0.04058533 0.9271632 0.02818501 0.6613646 0.05968923 0.7786685 0.120155222 0.8392044
## cg05234269 cg18285382 cg09216282 cg00084271 cg21697769 cg15098922 cg27577781 cg18150287 cg08096656 cg19503462 cg07634717 cg26853071 cg09247979 cg00154902 cg15184869 cg19471911
## 200223270003_R02C01 0.93848584 0.3202927 0.9349248 0.8103611 0.8946108 0.9286092 0.8143535 0.7685695 0.9362594 0.7951675 0.7483382 0.4233820 0.5070956 0.5137741 0.8622328 0.6334393
## 200223270003_R03C01 0.57461229 0.2930577 0.9244259 0.7877006 0.2822953 0.9027517 0.8113185 0.7519166 0.9314878 0.4537684 0.8254434 0.7451354 0.5706177 0.8540746 0.8996252 0.8437175
## 200223270003_R06C01 0.02467208 0.8923595 0.9263996 0.7706165 0.8698740 0.8525611 0.8144274 0.2501173 0.4943033 0.6997359 0.8181246 0.4228079 0.5090215 0.8188126 0.8688117 0.6127952
## cg12702014 cg03979311 cg11787167 cg18857647 cg11540596 cg25712921 cg12240569 cg19301366 cg25436480 cg13387643 cg12421087 cg11227702 cg00648024 cg17002719 cg15633912 cg16715186
## 200223270003_R02C01 0.7704049 0.86644909 0.03853894 0.8582332 0.9238951 0.2829848 0.82772064 0.8831393 0.8425160 0.4229959 0.5647607 0.86486075 0.51410972 0.04939181 0.1605530 0.2742789
## 200223270003_R03C01 0.7848681 0.06199853 0.04673831 0.8394132 0.8926595 0.6220919 0.02690547 0.8072679 0.4994032 0.4200273 0.5399655 0.49184121 0.40202875 0.40466475 0.9333421 0.7946153
## 200223270003_R06C01 0.8065993 0.72615553 0.32564508 0.2647491 0.8820252 0.6384003 0.46030640 0.8796022 0.3494312 0.4161488 0.5400348 0.02543724 0.05579011 0.51428089 0.8737362 0.8124316
## cg11019791 cg06880438 cg03660162 cg01008088 cg15535896 cg15600437 cg02078724 cg20823859 cg13372276 cg25208881 cg26679884 cg01921484 cg06960717 cg25169289 cg08584917 cg22305850
## 200223270003_R02C01 0.8112324 0.8285145 0.8691767 0.8424817 0.3382952 0.4885353 0.3096774 0.9030711 0.04888111 0.1851956 0.6793815 0.9098550 0.7030978 0.1100884 0.5663205 0.03361934
## 200223270003_R03C01 0.7831231 0.7988881 0.5160770 0.2417656 0.9253926 0.4894487 0.2896133 0.6062985 0.62396373 0.9092286 0.1848705 0.9093137 0.7653402 0.7667174 0.9019732 0.57522232
## 200223270003_R06C01 0.4353250 0.7839538 0.9026304 0.2618620 0.3320191 0.8551374 0.2805612 0.8917348 0.59693465 0.9265502 0.1701734 0.9204487 0.7206218 0.2264993 0.9187789 0.58548744
## cg11133939 cg01608425 cg06371647 cg03749159 cg24697433 cg21986118 cg18816397 cg01128042 cg15700429 cg25277809 cg22931151 cg24634455 cg13405878 cg02932958 cg11286989 cg05593887
## 200223270003_R02C01 0.1282694 0.9030410 0.8336894 0.9355921 0.9243095 0.6658175 0.5472925 0.9113420 0.7879010 0.1632342 0.9311023 0.7796391 0.4549662 0.7901008 0.7590008 0.5939220
## 200223270003_R03C01 0.5920898 0.9264388 0.8198684 0.9153921 0.6808390 0.6571296 0.4940355 0.5328806 0.9114530 0.4913711 0.9356702 0.5188241 0.7858042 0.4210489 0.8533989 0.5766550
## 200223270003_R06C01 0.5127706 0.8887753 0.8069537 0.9255807 0.6384606 0.7034445 0.5337018 0.5222757 0.8838233 0.5952124 0.9328614 0.5325725 0.7583938 0.3825995 0.7313884 0.9148338
## cg18918831 cg11247378 cg24139837 cg17042243 cg25879395 cg18029737 cg10681981 cg26846609 cg14293999 cg10240127 cg08198851 cg18993517 cg02823329 cg08745107 cg13573375 cg17738613
## 200223270003_R02C01 0.4891660 0.1591185 0.07404605 0.2502905 0.88130864 0.9100454 0.7035090 0.48860949 0.2836710 0.9250553 0.6578905 0.2091538 0.9462397 0.02921338 0.8670419 0.6879612
## 200223270003_R03C01 0.5333801 0.7874849 0.04183445 0.2933475 0.02603438 0.9016634 0.7382662 0.04878986 0.9172023 0.9403255 0.6578186 0.2665896 0.6464005 0.78542320 0.1733934 0.6582258
## 200223270003_R06C01 0.6406575 0.4807942 0.05657120 0.2725457 0.91060615 0.7376586 0.6971989 0.48026945 0.9168166 0.9056974 0.1272153 0.2574003 0.9633930 0.02709928 0.8888246 0.1022257
## cg02356645 cg05876883 cg24883219 cg00696044 cg17131279 cg08041188 cg24307368 cg06961873 cg05392160 cg26983017 cg07138269 cg04316537 cg27224751 cg04831745 cg12556569 cg17386240
## 200223270003_R02C01 0.5105903 0.9039064 0.6430473 0.55608424 0.1900637 0.7752456 0.64323677 0.5335591 0.9328933 0.89868232 0.5002290 0.8074830 0.44503947 0.61984995 0.06218231 0.7473400
## 200223270003_R03C01 0.5833923 0.9223308 0.6822115 0.07552381 0.7048637 0.3201255 0.34980461 0.5472606 0.2576881 0.03145466 0.9426707 0.8453340 0.03214912 0.71214149 0.03924599 0.7144809
## 200223270003_R06C01 0.5701428 0.4697980 0.5296903 0.79270858 0.1492861 0.7900939 0.02720398 0.9415177 0.8920726 0.84677625 0.5057781 0.4351695 0.83123722 0.06871768 0.48636893 0.8074824
## cg04412904 cg00345083 cg02668233 cg10788927 cg14687298 cg14170504 cg03672288 cg14307563 cg09451339 cg16431720 cg01662749 cg02495179 cg04768387 cg17002338 cg01933473 cg16089727
## 200223270003_R02C01 0.05088595 0.47960968 0.4708431 0.8973154 0.04206702 0.54915621 0.9235592 0.1855966 0.2243746 0.7356099 0.3506201 0.6813307 0.3131047 0.9286251 0.2589014 0.86748697
## 200223270003_R03C01 0.07717659 0.50833875 0.8841930 0.2021398 0.14813581 0.02236650 0.6718625 0.8916957 0.2340702 0.8692449 0.2510946 0.7373055 0.9465814 0.2684163 0.6726133 0.54996692
## 200223270003_R06C01 0.08253743 0.03929249 0.4575646 0.2053075 0.24260002 0.02988245 0.9007629 0.8750052 0.8921284 0.8773137 0.8061480 0.5588114 0.9098563 0.2811103 0.2642560 0.05876736
## cg24643105 PC3 cg00819121 cg09120722 cg27272246 cg06277607 cg03982462 cg09584650 cg08788093 cg22666875 cg22542451 cg00939409 cg17723206 cg05321907 cg12776173 cg25758034
## 200223270003_R02C01 0.5303418 -0.014043316 0.9207001 0.5878977 0.8615873 0.10744587 0.8562777 0.08230254 0.03911678 0.8177182 0.5884356 0.2652180 0.92881042 0.2880477 0.1038804 0.6114028
## 200223270003_R03C01 0.5042688 0.005055871 0.9281472 0.8287506 0.8705287 0.09353494 0.6023731 0.09661586 0.60934160 0.8291957 0.8337068 0.8882671 0.48556255 0.1782629 0.8730635 0.6649219
## 200223270003_R06C01 0.9383050 0.029143653 0.9327211 0.8793344 0.8103777 0.09504696 0.8778458 0.52399749 0.88380243 0.3694180 0.8125084 0.8842646 0.01765023 0.8427929 0.7009491 0.2393844
## cg14710850 cg23517115 cg17429539 cg17906851 cg00512739 cg12689021 cg16571124 cg22071943 cg25649515 cg04497611 cg15730644 cg13739190 cg25306893 cg16779438 cg06483046 cg14780448
## 200223270003_R02C01 0.8048592 0.2151144 0.7860900 0.9488392 0.9337648 0.7706828 0.9282854 0.8705217 0.9279829 0.9086359 0.4803181 0.8510103 0.6265392 0.8826150 0.04383925 0.9119141
## 200223270003_R03C01 0.8090950 0.9131440 0.7100923 0.9529718 0.8863895 0.7449475 0.9206431 0.2442648 0.9235753 0.8818513 0.4353906 0.8358482 0.8330282 0.5466924 0.50720277 0.6702102
## 200223270003_R06C01 0.8285902 0.8328364 0.7660838 0.6462151 0.9242748 0.7872237 0.9276842 0.2644581 0.5895839 0.5853116 0.8763048 0.8419471 0.6175380 0.8629492 0.89604910 0.6207355
## cg06833284 cg14507637 cg18819889 cg03549208 cg15985500 cg05161773 cg06403901 cg22169467 cg08857872 cg11187460 cg03600007 cg05850457 cg06715136 cg10091792 cg03221390 cg02122327
## 200223270003_R02C01 0.9125144 0.9051258 0.9156157 0.9014487 0.8555262 0.4120912 0.92790690 0.3095010 0.3395280 0.03672179 0.5658487 0.8183013 0.3400192 0.8670733 0.5859063 0.38940091
## 200223270003_R03C01 0.9003482 0.9009460 0.9004455 0.8381784 0.8312198 0.4154907 0.04783341 0.2978585 0.8181845 0.92516409 0.6018832 0.8313023 0.9259109 0.5864221 0.9180706 0.37769608
## 200223270003_R06C01 0.6097933 0.9013686 0.9054439 0.9097817 0.8492103 0.8526849 0.05253626 0.8955853 0.2970779 0.03109553 0.8611166 0.8161364 0.9079807 0.6087997 0.6399867 0.04017909
## cg21139150 cg14192979 cg23352245 cg00146240 cg20981163 cg27160885 cg00553601 cg12146221 cg13226272 cg22112152 cg23836570 cg08554146 cg09785377 cg01462799 cg06118351 cg17129965
## 200223270003_R02C01 0.01853264 0.06336040 0.9377232 0.6336151 0.8990628 0.2231606 0.05601299 0.2049284 0.02637249 0.8476101 0.58688450 0.8982080 0.9162088 0.8284427 0.3633940 0.8972140
## 200223270003_R03C01 0.43223243 0.06019651 0.9375774 0.8957183 0.9264076 0.8263885 0.58957701 0.1814927 0.54100016 0.8014136 0.54259383 0.8963074 0.9226292 0.4038824 0.4714860 0.8806673
## 200223270003_R06C01 0.43772680 0.52114282 0.5932742 0.1433218 0.4874651 0.2121179 0.62426500 0.8619250 0.44370701 0.7897897 0.03267304 0.8213878 0.6405193 0.4676821 0.8655962 0.8857237
## cg18339359 cg11438323 cg00295418 cg08896901 cg18526121 cg02550738 cg04664583 cg07028768 cg01549082 cg13815695 cg02627240 cg19799454 cg06864789 cg03737947 cg14532717 cg22535849
## 200223270003_R02C01 0.8824858 0.4863471 0.44954665 0.3581911 0.4519781 0.6201457 0.5572814 0.4496851 0.2924138 0.9267057 0.66706843 0.9178930 0.05369415 0.91824910 0.5732280 0.8847704
## 200223270003_R03C01 0.9040272 0.8984559 0.48471295 0.2467071 0.4762313 0.9011727 0.5881190 0.8536078 0.7065693 0.6859729 0.57129408 0.9106247 0.46053125 0.92067153 0.1107638 0.8609966
## 200223270003_R06C01 0.8552121 0.8722772 0.02004532 0.9225209 0.4833367 0.9085849 0.9352717 0.8356936 0.2895440 0.6509046 0.05309659 0.9066551 0.87513655 0.03638091 0.6273416 0.8808022
## cg04718469 cg14627380 cg10039445 cg02631626 cg20673830 cg17268094 cg11706829 cg16733676 cg20078646 cg13368637 cg16652920 cg26901661 cg04888234 cg04242342 cg00322820 cg23066280
## 200223270003_R02C01 0.8687522 0.9455369 0.8833873 0.6280766 0.2422052 0.5774753 0.8897234 0.9057228 0.06198170 0.5597507 0.9436000 0.8951971 0.8379655 0.8206769 0.4869764 0.07247841
## 200223270003_R03C01 0.7256813 0.9258964 0.8954055 0.1951736 0.6881735 0.9003262 0.5444785 0.8904541 0.89537412 0.9100088 0.9431222 0.8754981 0.4376314 0.8167892 0.4858988 0.57174588
## 200223270003_R06C01 0.8521881 0.5789898 0.8832807 0.2699849 0.2134634 0.8789368 0.5669449 0.1698111 0.08725521 0.8739205 0.9457161 0.9021064 0.8039047 0.8040357 0.4754313 0.80814756
## cg07480955 cg02772171 cg21243064 cg21388339 cg01153376 cg15775217 cg02621446 cg10666341 cg23177161 cg02246922 cg25174111 cg00322003 cg15586958 cg06231502 age.now cg18949721
## 200223270003_R02C01 0.3874638 0.9182018 0.5191606 0.2756268 0.4872148 0.5707441 0.8731313 0.9046648 0.4151698 0.7301201 0.8526503 0.1759911 0.9058263 0.7784451 82.4 0.2334245
## 200223270003_R03C01 0.3916889 0.5660559 0.9167649 0.2102269 0.9639670 0.9168327 0.8095534 0.6731062 0.4586576 0.9447019 0.8573844 0.5702070 0.8957526 0.7964278 78.6 0.2437792
## 200223270003_R06C01 0.4043390 0.8995479 0.4862205 0.7649181 0.2242410 0.6042521 0.7511582 0.6443180 0.8287312 0.7202230 0.2567745 0.3077122 0.9121763 0.7706160 80.4 0.2523095
## cg12228670 cg11314779 cg23916408 cg01280698 cg04124201 cg12784167 cg04645024 cg16202259 cg11268585 cg15501526 cg03084184 cg12333628 cg21783012 cg13038195 cg04867412 cg20803293
## 200223270003_R02C01 0.8632174 0.0242134 0.1942275 0.8985067 0.8686421 0.81503498 0.7366541 0.9548726 0.2521544 0.6362531 0.8162981 0.9227884 0.9142369 0.45882213 0.04304823 0.54933918
## 200223270003_R03C01 0.8496212 0.8966100 0.9154993 0.8846201 0.3308589 0.02811410 0.8454827 0.3713483 0.8535791 0.6319253 0.7877128 0.9092861 0.6694884 0.02740132 0.87967997 0.07935747
## 200223270003_R06C01 0.8738949 0.8908661 0.8886255 0.8847132 0.3241613 0.03073269 0.0871902 0.4852461 0.9121931 0.7435100 0.4546397 0.5084647 0.9070112 0.46284376 0.44971146 0.42191244
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
if(METHOD_FEATURE_FLAG==1){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m1_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m1[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
print(df_process_frequency_FeatureName)
## [1] "PC1" "cg23432430" "cg09727210" "PC2" "cg00962106" "cg07158503" "cg06697310" "cg02225060" "cg09015880" "cg10701746" "cg16338321" "cg26081710" "cg00415024" "cg21757617" "cg14168080"
## [16] "cg02887598" "cg05064044" "cg01910713" "cg11331837" "cg07504457" "cg00004073" "cg04156077" "cg10738648" "cg07640670" "cg16858433" "cg12543766" "cg20685672" "cg24851651" "cg20678988" "cg03088219"
## [31] "cg16536985" "cg05234269" "cg18285382" "cg09216282" "cg00084271" "cg21697769" "cg15098922" "cg27577781" "cg18150287" "cg08096656" "cg19503462" "cg07634717" "cg26853071" "cg09247979" "cg00154902"
## [46] "cg15184869" "cg19471911" "cg12702014" "cg03979311" "cg11787167" "cg18857647" "cg11540596" "cg25712921" "cg12240569" "cg19301366" "cg25436480" "cg13387643" "cg12421087" "cg11227702" "cg00648024"
## [61] "cg17002719" "cg15633912" "cg16715186" "cg11019791" "cg06880438" "cg03660162" "cg01008088" "cg15535896" "cg15600437" "cg02078724" "cg20823859" "cg13372276" "cg25208881" "cg26679884" "cg01921484"
## [76] "cg06960717" "cg25169289" "cg08584917" "cg22305850" "cg11133939" "cg01608425" "cg06371647" "cg03749159" "cg24697433" "cg21986118" "cg18816397" "cg01128042" "cg15700429" "cg25277809" "cg22931151"
## [91] "cg24634455" "cg13405878" "cg02932958" "cg11286989" "cg05593887" "cg18918831" "cg11247378" "cg24139837" "cg17042243" "cg25879395" "cg18029737" "cg10681981" "cg26846609" "cg14293999" "cg10240127"
## [106] "cg08198851" "cg18993517" "cg02823329" "cg08745107" "cg13573375" "cg17738613" "cg02356645" "cg05876883" "cg24883219" "cg00696044" "cg17131279" "cg08041188" "cg24307368" "cg06961873" "cg05392160"
## [121] "cg26983017" "cg07138269" "cg04316537" "cg27224751" "cg04831745" "cg12556569" "cg17386240" "cg04412904" "cg00345083" "cg02668233" "cg10788927" "cg14687298" "cg14170504" "cg03672288" "cg14307563"
## [136] "cg09451339" "cg16431720" "cg01662749" "cg02495179" "cg04768387" "cg17002338" "cg01933473" "cg16089727" "cg24643105" "PC3" "cg00819121" "cg09120722" "cg27272246" "cg06277607" "cg03982462"
## [151] "cg09584650" "cg08788093" "cg22666875" "cg22542451" "cg00939409" "cg17723206" "cg05321907" "cg12776173" "cg25758034" "cg14710850" "cg23517115" "cg17429539" "cg17906851" "cg00512739" "cg12689021"
## [166] "cg16571124" "cg22071943" "cg25649515" "cg04497611" "cg15730644" "cg13739190" "cg25306893" "cg16779438" "cg06483046" "cg14780448" "cg06833284" "cg14507637" "cg18819889" "cg03549208" "cg15985500"
## [181] "cg05161773" "cg06403901" "cg22169467" "cg08857872" "cg11187460" "cg03600007" "cg05850457" "cg06715136" "cg10091792" "cg03221390" "cg02122327" "cg21139150" "cg14192979" "cg23352245" "cg00146240"
## [196] "cg20981163" "cg27160885" "cg00553601" "cg12146221" "cg13226272" "cg22112152" "cg23836570" "cg08554146" "cg09785377" "cg01462799" "cg06118351" "cg17129965" "cg18339359" "cg11438323" "cg00295418"
## [211] "cg08896901" "cg18526121" "cg02550738" "cg04664583" "cg07028768" "cg01549082" "cg13815695" "cg02627240" "cg19799454" "cg06864789" "cg03737947" "cg14532717" "cg22535849" "cg04718469" "cg14627380"
## [226] "cg10039445" "cg02631626" "cg20673830" "cg17268094" "cg11706829" "cg16733676" "cg20078646" "cg13368637" "cg16652920" "cg26901661" "cg04888234" "cg04242342" "cg00322820" "cg23066280" "cg07480955"
## [241] "cg02772171" "cg21243064" "cg21388339" "cg01153376" "cg15775217" "cg02621446" "cg10666341" "cg23177161" "cg02246922" "cg25174111" "cg00322003" "cg15586958" "cg06231502" "age.now" "cg18949721"
## [256] "cg12228670" "cg11314779" "cg23916408" "cg01280698" "cg04124201" "cg12784167" "cg04645024" "cg16202259" "cg11268585" "cg15501526" "cg03084184" "cg12333628" "cg21783012" "cg13038195" "cg04867412"
## [271] "cg20803293"
Selected_Frequency_Feature_importance <-all_Output_combined_df_impAvg[all_Output_combined_df_impAvg$Total_Count>=3,]
print(Selected_Frequency_Feature_importance)
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0 1 0 1 1 3 0.004163450 0.634974445 0.003392297 0.5289698 0.5000000 0.3343000
## 2 cg00004073 1 1 1 1 1 5 0.269349519 0.252321139 0.368743031 0.3932005 0.3333333 0.3233895
## 3 cg00084271 1 1 1 1 1 5 0.223582217 0.084514434 0.272790932 0.5066986 0.1666667 0.2508506
## 5 cg00146240 1 0 1 1 1 4 0.087293373 0.000000000 0.195466203 0.5233594 0.1666667 0.1945571
## 6 cg00154902 1 1 1 1 1 5 0.205862688 0.357148936 0.223748437 0.4724526 0.3333333 0.3185092
## 7 cg00295418 1 0 1 1 1 4 0.052680354 0.000000000 0.126220928 0.2779946 0.3333333 0.1580459
## 8 cg00322003 1 0 1 0 1 3 0.026025136 0.000000000 0.170919274 0.1854808 0.5000000 0.1764850
## 9 cg00322820 1 0 1 0 1 3 0.082466488 0.000000000 0.112276903 0.1971691 0.3333333 0.1450492
## 10 cg00345083 1 1 1 1 1 5 0.067531780 0.007214598 0.125533467 0.3843604 0.1666667 0.1502614
## 11 cg00415024 1 1 1 1 1 5 0.312750147 0.191638841 0.368625491 0.6225654 0.1666667 0.3324493
## 12 cg00512739 1 0 1 1 1 4 0.182276652 0.000000000 0.247625185 0.2915734 0.3333333 0.2109617
## 13 cg00553601 1 1 1 1 0 4 0.082411954 0.047507352 0.118912401 0.2530610 0.0000000 0.1003786
## 14 cg00648024 1 1 1 1 1 5 0.188256686 0.138389121 0.282665821 0.3302662 0.3333333 0.2545822
## 15 cg00696044 1 1 1 1 1 5 0.086245021 0.155116397 0.210684436 0.4958086 0.1666667 0.2229042
## 17 cg00819121 1 0 1 1 1 4 0.324605878 0.000000000 0.375766891 0.2925906 0.1666667 0.2319260
## 18 cg00939409 1 1 1 0 1 4 0.215333810 0.096720545 0.280167155 0.2332442 0.1666667 0.1984265
## 19 cg00962106 1 1 1 1 1 5 0.411641806 0.324339586 0.520231601 0.2711677 0.8333333 0.4721428
## 20 cg01008088 1 1 1 1 1 5 0.176892321 0.334393663 0.213601915 0.6712800 0.1666667 0.3125669
## 21 cg01128042 1 1 1 1 1 5 0.130128410 0.305711493 0.206130764 0.5172017 0.3333333 0.2985011
## 22 cg01153376 1 1 0 1 0 3 0.054613673 0.064932923 0.093178458 0.2663037 0.1666667 0.1291391
## 23 cg01280698 0 1 0 1 1 3 0.000000000 0.171928832 0.068309707 0.3119038 0.6666667 0.2437618
## 24 cg01462799 1 1 1 1 0 4 0.067347244 0.017976135 0.112488167 0.2345782 0.1666667 0.1198113
## 25 cg01549082 1 1 1 1 0 4 0.036089818 0.289658654 0.116966623 0.4562195 0.0000000 0.1797869
## 26 cg01608425 1 1 1 1 1 5 0.143336624 0.224704929 0.176798440 0.2616919 0.3333333 0.2279731
## 27 cg01662749 1 1 1 1 1 5 0.046585538 0.003576880 0.169671947 0.4083134 0.3333333 0.1922962
## 28 cg01910713 1 1 1 1 1 5 0.288793899 0.036071458 0.346939108 0.2852783 0.3333333 0.2580832
## 29 cg01921484 1 1 1 1 1 5 0.151192434 0.022248999 0.314260237 0.4975570 0.3333333 0.2637184
## 30 cg01933473 1 1 1 1 1 5 0.032966179 0.188410259 0.131296579 0.3895250 0.3333333 0.2151063
## 31 cg02078724 1 1 1 1 1 5 0.170470017 0.100360545 0.230071851 0.4201033 0.5000000 0.2842011
## 32 cg02122327 1 1 1 1 0 4 0.101980457 0.405336662 0.186377937 0.3776111 0.0000000 0.2142612
## 33 cg02225060 1 1 1 1 1 5 0.354934811 0.014823116 0.478455851 0.5342408 0.1666667 0.3098243
## 34 cg02246922 1 0 0 1 1 3 0.030685944 0.000000000 0.101537140 0.3343261 0.3333333 0.1599765
## 35 cg02356645 1 1 1 1 1 5 0.089543386 0.067243906 0.111478962 0.2928698 0.5000000 0.2122272
## 37 cg02495179 1 1 1 1 1 5 0.041302003 0.182002026 0.131585203 0.3799953 0.3333333 0.2136436
## 38 cg02550738 1 1 1 0 1 4 0.042141616 0.020463221 0.118472357 0.1533466 0.3333333 0.1335514
## 39 cg02621446 1 1 0 1 0 3 0.049579364 0.031024055 0.094051394 0.2771201 0.1666667 0.1236883
## 40 cg02627240 1 1 0 1 1 4 0.025034816 0.291605934 0.085034592 0.3097444 0.1666667 0.1756173
## 41 cg02631626 0 1 1 1 1 4 0.000000000 0.018210136 0.164542120 0.2789742 0.3333333 0.1590120
## 43 cg02668233 1 1 1 1 1 5 0.065997478 0.031214253 0.171125735 0.4860534 0.3333333 0.2175448
## 44 cg02772171 1 1 1 0 0 3 0.074944907 0.008455338 0.136723512 0.1987788 0.1666667 0.1171138
## 45 cg02823329 1 1 1 1 1 5 0.095665171 0.066658022 0.237420190 0.3912554 0.3333333 0.2248664
## 46 cg02887598 1 1 1 1 1 5 0.298943556 0.068421117 0.373777071 0.4329452 0.3333333 0.3014841
## 47 cg02932958 1 1 1 1 1 5 0.119976765 0.023889341 0.208520484 0.5167944 0.5000000 0.2738362
## 48 cg03084184 0 1 0 1 1 3 0.000000000 0.044329829 0.015641222 0.4458309 0.5000000 0.2011604
## 49 cg03088219 1 1 1 1 1 5 0.242526726 0.386706306 0.326265900 0.3305586 0.1666667 0.2905448
## 50 cg03221390 1 1 1 0 1 4 0.107855280 0.094349016 0.133511928 0.2168214 0.1666667 0.1438409
## 53 cg03549208 1 1 1 0 1 4 0.144872243 0.013354417 0.184090016 0.1924605 0.3333333 0.1736221
## 54 cg03600007 1 1 1 0 1 4 0.124581642 0.133851634 0.249180824 0.1581174 0.6666667 0.2664796
## 55 cg03660162 1 1 1 1 1 5 0.178105731 0.520174028 0.357981700 0.3014357 0.5000000 0.3715394
## 56 cg03672288 1 1 1 1 1 5 0.057943034 0.110735242 0.208194157 0.4151766 0.1666667 0.1917431
## 57 cg03737947 0 1 1 1 1 4 0.016829061 0.272380669 0.137723494 0.5336092 0.1666667 0.2254418
## 58 cg03749159 1 1 1 1 1 5 0.136655025 0.327177056 0.189032603 0.7288038 0.3333333 0.3430004
## 59 cg03979311 1 1 1 1 1 5 0.197215429 0.040125753 0.316823108 0.2858039 0.3333333 0.2346603
## 60 cg03982462 1 0 1 1 1 4 0.260179978 0.000000000 0.364985173 0.4805497 0.1666667 0.2544763
## 62 cg04124201 0 1 0 1 1 3 0.000000000 0.161334846 0.047890497 0.2991655 0.3333333 0.1683448
## 63 cg04156077 1 1 1 1 1 5 0.267989586 0.278838990 0.289741411 0.3698890 0.5000000 0.3412918
## 64 cg04242342 1 0 1 0 1 3 0.083050748 0.000000000 0.146924727 0.1305678 0.3333333 0.1387753
## 65 cg04316537 1 1 1 1 1 5 0.076660051 0.072015864 0.197150638 0.5322393 0.1666667 0.2089465
## 66 cg04412904 1 1 1 1 1 5 0.067907761 0.008001825 0.153993774 0.3323540 0.3333333 0.1791181
## 68 cg04497611 1 1 1 0 1 4 0.169004430 0.168998579 0.245534890 0.1676654 0.1666667 0.1835740
## 70 cg04645024 0 1 0 1 1 3 0.000000000 0.089605477 0.035342163 0.4314151 0.1666667 0.1446059
## 71 cg04664583 1 0 1 1 1 4 0.039767751 0.000000000 0.132274360 0.3593538 0.1666667 0.1396125
## 72 cg04718469 0 1 1 1 1 4 0.000000000 0.073525158 0.120957256 0.3833990 0.5000000 0.2155763
## 73 cg04768387 1 1 1 1 1 5 0.041259130 0.007136179 0.110328828 0.6617940 0.3333333 0.2307703
## 74 cg04831745 1 1 1 1 1 5 0.072362310 0.163128841 0.170924298 0.4914946 0.3333333 0.2462487
## 75 cg04867412 0 1 0 1 1 3 0.009952011 0.006644894 0.051653712 0.3521789 0.3333333 0.1507526
## 76 cg04888234 1 1 1 0 0 3 0.085173667 0.011359898 0.158877410 0.2305396 0.1666667 0.1305235
## 77 cg05064044 1 1 1 1 1 5 0.298161376 0.083498934 0.370533860 0.4486357 0.1666667 0.2734993
## 79 cg05161773 1 1 1 0 1 4 0.134312114 0.005676604 0.243185340 0.1921429 0.3333333 0.1817300
## 80 cg05234269 1 1 1 1 1 5 0.236394701 0.117738693 0.303552770 0.3708301 0.3333333 0.2723699
## 81 cg05321907 1 0 1 1 1 4 0.211489718 0.000000000 0.232925760 0.3334943 0.1666667 0.1889153
## 82 cg05392160 1 1 1 1 1 5 0.081302245 0.165992808 0.126803158 0.2377531 0.3333333 0.1890369
## 83 cg05593887 1 1 1 1 1 5 0.109637113 0.008367983 0.222451327 0.2341437 0.3333333 0.1815867
## 84 cg05850457 1 1 0 1 1 4 0.118261095 0.028639870 0.050214597 0.4977309 0.1666667 0.1723026
## 85 cg05876883 1 1 1 1 1 5 0.088595752 0.006101381 0.238365969 0.2767240 0.3333333 0.1886241
## 88 cg06118351 1 0 1 1 1 4 0.062151507 0.000000000 0.178593029 0.4821947 0.1666667 0.1779212
## [ reached 'max' / getOption("max.print") -- omitted 195 rows ]
# Output data frame with selected features based on mean method:
# "selected_impAvg_ordered_NAME", This data frame don't have column named "SampleID"
if(Flag_8mean){
filename_mean <- paste0("Selected_mean", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_mean <- paste0(OUTUT_CSV_PATHNAME, filename_mean)
if (file.exists(OUTPUTPATH_mean)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_selected_Mean,
file = OUTPUTPATH_mean,
row.names = FALSE)
}
}
if(Flag_8median){
filename_median <- paste0("Selected_median", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_median <- paste0(OUTUT_CSV_PATHNAME, filename_median)
if (file.exists(OUTPUTPATH_median)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_selected_Median,
file = OUTPUTPATH_median,
row.names = FALSE)
}
}
if(Flag_8Fequency){
filename_frequency <- paste0("Selected_frequency", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_frequency <- paste0(OUTUT_CSV_PATHNAME, filename_frequency)
if (file.exists(OUTPUTPATH_frequency)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_process_Output_freq,
file = OUTPUTPATH_frequency,
row.names = FALSE)
}
}
# This is the flag of phenotype data output,
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".
phenotypeDF<-merged_df_raw[,colnames(phenoticPart_RAW)]
print(head(phenotypeDF))
## barcodes RID.a prop.B prop.NK prop.CD4T prop.CD8T prop.Mono prop.Neutro prop.Eosino DX age.now PTGENDER ABETA TAU PTAU PC1 PC2
## 200223270003_R02C01 200223270003_R02C01 2190 0.03164651 0.03609239 0.010771839 0.01481567 0.06533409 0.8413395 0 MCI 82.40000 Male 963.2 341.5 35.48 -0.214185447 1.470293e-02
## 200223270003_R03C01 200223270003_R03C01 4080 0.03556363 0.04697771 0.002321312 0.06381941 0.04901806 0.8022999 0 CN 78.60000 Female 950.6 295.9 28.08 -0.172761185 5.745834e-02
## 200223270003_R06C01 200223270003_R06C01 4505 0.07129589 0.04412218 0.037684081 0.11457236 0.08745402 0.6448715 0 CN 80.40000 Female 1705.0 353.2 28.49 -0.003667305 8.372861e-02
## 200223270003_R07C01 200223270003_R07C01 1010 0.02081699 0.07117668 0.040966085 0.00000000 0.04459325 0.8224470 0 Dementia 78.16441 Male 493.3 272.8 22.75 -0.186779607 -1.117250e-02
## 200223270006_R01C01 200223270006_R01C01 4226 0.02680465 0.04767947 0.128514873 0.09085886 0.07419209 0.6319501 0 MCI 62.90000 Female 1705.0 253.1 22.84 0.026814649 1.650735e-05
## 200223270006_R04C01 200223270006_R04C01 1190 0.07063013 0.05250647 0.064529118 0.04309168 0.08796080 0.6812818 0 CN 80.67796 Female 1336.0 439.3 40.78 -0.037862929 1.571950e-02
## PC3 ageGroup ageGroupsq DX_num uniqueID Horvath
## 200223270003_R02C01 -0.014043316 0.6606949 0.43651772 0 1 61.50365
## 200223270003_R03C01 0.005055871 0.2806949 0.07878961 0 1 69.26678
## 200223270003_R06C01 0.029143653 0.4606949 0.21223977 0 1 96.84418
## 200223270003_R07C01 -0.032302430 0.2371357 0.05623333 1 1 61.76446
## 200223270006_R01C01 0.052947950 -1.2893051 1.66230770 0 1 59.33885
## 200223270006_R04C01 -0.008685676 0.4884909 0.23862336 0 1 70.27197
OUTPUTPATH_phenotypePart <- paste0(OUTUT_CSV_PATHNAME, "PhenotypePart_df.csv")
if(phenoOutPUt_FLAG ){
if (file.exists(OUTPUTPATH_phenotypePart)) {
print("Phenotype File already exists")}
else {
write.csv(phenotypeDF, file = OUTPUTPATH_phenotypePart, row.names = FALSE)
}
}
## [1] "Phenotype File already exists"
Performance of the selected output features based on Mean
processed_dataFrame<-df_selected_Mean
processed_data<-output_mean_process
AfterProcess_FeatureName<-selected_impAvg_ordered_NAME
print(head(output_mean_process))
## # A tibble: 6 × 251
## DX cg23432430 PC3 PC2 cg00962106 PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080 cg03749159 cg20678988 cg04156077
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CI 0.948 -0.0140 1.47e-2 0.912 -0.214 0.578 0.845 0.0369 0.748 0.869 0.0367 0.811 0.671 0.875 0.419 0.936 0.844 0.732
## 2 CN 0.946 0.00506 5.75e-2 0.538 -0.173 0.620 0.865 0.572 0.825 0.516 0.0536 0.783 0.793 0.920 0.442 0.915 0.855 0.687
## 3 CN 0.942 0.0291 8.37e-2 0.504 -0.00367 0.624 0.241 0.0318 0.818 0.903 0.0597 0.435 0.661 0.880 0.436 0.926 0.779 0.850
## 4 CI 0.943 -0.0323 -1.12e-2 0.904 -0.187 0.599 0.848 0.0383 0.758 0.531 0.609 0.850 0.808 0.915 0.957 0.629 0.826 0.680
## 5 CI 0.946 0.0529 1.65e-5 0.896 0.0268 0.631 0.821 0.930 0.826 0.926 0.0883 0.854 0.0829 0.917 0.946 0.929 0.330 0.891
## 6 CN 0.951 -0.00869 1.57e-2 0.886 -0.0379 0.615 0.784 0.540 0.210 0.894 0.919 0.738 0.845 0.923 0.399 0.612 0.854 0.837
## # ℹ 232 more variables: cg19503462 <dbl>, cg26853071 <dbl>, age.now <dbl>, cg11540596 <dbl>, cg00415024 <dbl>, cg10701746 <dbl>, cg00004073 <dbl>, cg11227702 <dbl>, cg19471911 <dbl>,
## # cg09727210 <dbl>, cg00154902 <dbl>, cg17002719 <dbl>, cg07504457 <dbl>, cg25879395 <dbl>, cg01008088 <dbl>, cg02225060 <dbl>, cg12543766 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>,
## # cg19799454 <dbl>, cg02887598 <dbl>, cg01128042 <dbl>, cg21697769 <dbl>, cg25208881 <dbl>, cg16779438 <dbl>, cg17386240 <dbl>, cg03088219 <dbl>, cg24883219 <dbl>, cg15535896 <dbl>,
## # cg16338321 <dbl>, cg21757617 <dbl>, cg18285382 <dbl>, cg17429539 <dbl>, cg10738648 <dbl>, cg02078724 <dbl>, cg09015880 <dbl>, cg20823859 <dbl>, cg18816397 <dbl>, cg16431720 <dbl>,
## # cg06833284 <dbl>, cg23517115 <dbl>, cg11438323 <dbl>, cg02932958 <dbl>, cg08096656 <dbl>, cg05064044 <dbl>, cg05234269 <dbl>, cg25169289 <dbl>, cg14710850 <dbl>, cg26679884 <dbl>,
## # cg03600007 <dbl>, cg15098922 <dbl>, cg01921484 <dbl>, cg16715186 <dbl>, cg06961873 <dbl>, cg12240569 <dbl>, cg01910713 <dbl>, cg25712921 <dbl>, cg00648024 <dbl>, cg03982462 <dbl>,
## # cg08745107 <dbl>, cg26983017 <dbl>, cg00084271 <dbl>, cg16858433 <dbl>, cg06371647 <dbl>, cg26846609 <dbl>, cg15184869 <dbl>, cg13573375 <dbl>, cg04831745 <dbl>, cg22931151 <dbl>, …
print(selected_impAvg_ordered_NAME)
## [1] "cg23432430" "PC3" "PC2" "cg00962106" "PC1" "cg07158503" "cg06697310" "cg11331837" "cg07634717" "cg03660162" "cg24851651" "cg11019791" "cg20685672" "cg26081710" "cg14168080"
## [16] "cg03749159" "cg20678988" "cg04156077" "cg19503462" "cg26853071" "age.now" "cg11540596" "cg00415024" "cg10701746" "cg00004073" "cg11227702" "cg19471911" "cg09727210" "cg00154902" "cg17002719"
## [31] "cg07504457" "cg25879395" "cg01008088" "cg02225060" "cg12543766" "cg09120722" "cg11787167" "cg19799454" "cg02887598" "cg01128042" "cg21697769" "cg25208881" "cg16779438" "cg17386240" "cg03088219"
## [46] "cg24883219" "cg15535896" "cg16338321" "cg21757617" "cg18285382" "cg17429539" "cg10738648" "cg02078724" "cg09015880" "cg20823859" "cg18816397" "cg16431720" "cg06833284" "cg23517115" "cg11438323"
## [61] "cg02932958" "cg08096656" "cg05064044" "cg05234269" "cg25169289" "cg14710850" "cg26679884" "cg03600007" "cg15098922" "cg01921484" "cg16715186" "cg06961873" "cg12240569" "cg01910713" "cg25712921"
## [76] "cg00648024" "cg03982462" "cg08745107" "cg26983017" "cg00084271" "cg16858433" "cg06371647" "cg26846609" "cg15184869" "cg13573375" "cg04831745" "cg22931151" "cg18918831" "cg07640670" "cg15600437"
## [91] "cg01280698" "cg12689021" "cg27577781" "cg13405878" "cg22666875" "cg16536985" "cg16202259" "cg18857647" "cg22305850" "cg27224751" "cg09247979" "cg12333628" "cg16571124" "cg03979311" "cg12421087"
## [106] "cg15700429" "cg13739190" "cg00819121" "cg25436480" "cg04768387" "cg24634455" "cg11133939" "cg17042243" "cg22542451" "cg01608425" "cg06864789" "cg06880438" "cg13387643" "cg12702014" "cg03737947"
## [121] "cg02823329" "cg00696044" "cg06960717" "cg20673830" "cg25649515" "cg10681981" "cg15633912" "cg02668233" "cg27272246" "cg18150287" "cg18339359" "cg04718469" "cg01933473" "cg02122327" "cg18993517"
## [136] "cg02495179" "cg02356645" "cg09216282" "cg09584650" "cg00512739" "cg23352245" "cg12776173" "cg19301366" "cg25758034" "cg04316537" "cg14687298" "cg13226272" "cg13372276" "cg12556569" "cg06277607"
## [151] "cg17002338" "cg24307368" "cg14627380" "cg10091792" "cg08584917" "cg18819889" "cg24697433" "cg03084184" "cg23159970" "cg22112152" "cg12784167" "cg08198851" "cg17129965" "cg00939409" "cg08788093"
## [166] "cg09451339" "cg20078646" "cg10788927" "cg16089727" "cg00146240" "cg15775217" "cg18526121" "cg01662749" "cg14192979" "cg03672288" "cg25306893" "cg05392160" "cg05321907" "cg25277809" "cg05876883"
## [181] "cg06715136" "cg06483046" "cg14307563" "cg14170504" "cg04497611" "cg24139837" "cg05161773" "cg05593887" "cg11286989" "cg10240127" "cg27160885" "cg01549082" "cg04412904" "cg14532717" "cg06118351"
## [196] "cg22535849" "cg11706829" "cg00322003" "cg08554146" "cg02627240" "cg18029737" "cg17723206" "cg03549208" "cg21986118" "cg05850457" "cg09785377" "cg14293999" "cg07138269" "cg15985500" "cg14780448"
## [211] "cg04124201" "cg17738613" "cg17906851" "cg22169467" "cg22071943" "cg20981163" "cg10039445" "cg02246922" "cg08896901" "cg02631626" "cg11247378" "cg08857872" "cg00295418" "cg14507637" "cg18949721"
## [226] "cg11187460" "cg12146221" "cg08041188" "cg04867412" "cg00345083" "cg11268585" "cg21388339" "cg12228670" "cg23916408" "cg26901661" "cg21243064" "cg06403901" "cg15730644" "cg00322820" "cg04645024"
## [241] "cg24643105" "cg03221390" "cg21139150" "cg17131279" "cg15501526" "cg13653328" "cg24470466" "cg23836570" "cg13038195" "cg04664583"
print(head(df_selected_Mean))
## DX cg23432430 PC3 PC2 cg00962106 PC1 cg07158503 cg06697310 cg11331837 cg07634717 cg03660162 cg24851651 cg11019791 cg20685672 cg26081710 cg14168080
## 200223270003_R02C01 CI 0.9482702 -0.014043316 0.01470293 0.9124898 -0.214185447 0.5777146 0.8454609 0.03692842 0.7483382 0.8691767 0.03674702 0.8112324 0.6712101 0.8751040 0.4190123
## 200223270003_R03C01 CN 0.9455418 0.005055871 0.05745834 0.5375751 -0.172761185 0.6203543 0.8653044 0.57150125 0.8254434 0.5160770 0.05358297 0.7831231 0.7932091 0.9198212 0.4420256
## 200223270003_R06C01 CN 0.9418716 0.029143653 0.08372861 0.5040948 -0.003667305 0.6236025 0.2405168 0.03182862 0.8181246 0.9026304 0.05968923 0.4353250 0.6613646 0.8801892 0.4355521
## cg03749159 cg20678988 cg04156077 cg19503462 cg26853071 age.now cg11540596 cg00415024 cg10701746 cg00004073 cg11227702 cg19471911 cg09727210 cg00154902 cg17002719 cg07504457
## 200223270003_R02C01 0.9355921 0.8438718 0.7321883 0.7951675 0.4233820 82.4 0.9238951 0.4299553 0.4795503 0.02928535 0.86486075 0.6334393 0.4240111 0.5137741 0.04939181 0.7116230
## 200223270003_R03C01 0.9153921 0.8548886 0.6865805 0.4537684 0.7451354 78.6 0.8926595 0.3999122 0.4868342 0.02787198 0.49184121 0.8437175 0.8812928 0.8540746 0.40466475 0.6854539
## 200223270003_R06C01 0.9255807 0.7786685 0.8501188 0.6997359 0.4228079 80.4 0.8820252 0.7465084 0.4927257 0.64576463 0.02543724 0.6127952 0.8493743 0.8188126 0.51428089 0.7205633
## cg25879395 cg01008088 cg02225060 cg12543766 cg09120722 cg11787167 cg19799454 cg02887598 cg01128042 cg21697769 cg25208881 cg16779438 cg17386240 cg03088219 cg24883219 cg15535896
## 200223270003_R02C01 0.88130864 0.8424817 0.6828159 0.51028134 0.5878977 0.03853894 0.9178930 0.04020908 0.9113420 0.8946108 0.1851956 0.8826150 0.7473400 0.844002862 0.6430473 0.3382952
## 200223270003_R03C01 0.02603438 0.2417656 0.8265195 0.88741539 0.8287506 0.04673831 0.9106247 0.67073881 0.5328806 0.2822953 0.9092286 0.5466924 0.7144809 0.007435243 0.6822115 0.9253926
## 200223270003_R06C01 0.91060615 0.2618620 0.5209552 0.02818501 0.8793344 0.32564508 0.9066551 0.73408417 0.5222757 0.8698740 0.9265502 0.8629492 0.8074824 0.120155222 0.5296903 0.3320191
## cg16338321 cg21757617 cg18285382 cg17429539 cg10738648 cg02078724 cg09015880 cg20823859 cg18816397 cg16431720 cg06833284 cg23517115 cg11438323 cg02932958 cg08096656 cg05064044
## 200223270003_R02C01 0.5350242 0.03652647 0.3202927 0.7860900 0.44931577 0.3096774 0.5101716 0.9030711 0.5472925 0.7356099 0.9125144 0.2151144 0.4863471 0.7901008 0.9362594 0.5672851
## 200223270003_R03C01 0.8294062 0.44299089 0.2930577 0.7100923 0.49894016 0.2896133 0.8402106 0.6062985 0.4940355 0.8692449 0.9003482 0.9131440 0.8984559 0.4210489 0.9314878 0.5358875
## 200223270003_R06C01 0.4918708 0.44725379 0.8923595 0.7660838 0.05552024 0.2805612 0.8472063 0.8917348 0.5337018 0.8773137 0.6097933 0.8328364 0.8722772 0.3825995 0.4943033 0.5273964
## cg05234269 cg25169289 cg14710850 cg26679884 cg03600007 cg15098922 cg01921484 cg16715186 cg06961873 cg12240569 cg01910713 cg25712921 cg00648024 cg03982462 cg08745107 cg26983017
## 200223270003_R02C01 0.93848584 0.1100884 0.8048592 0.6793815 0.5658487 0.9286092 0.9098550 0.2742789 0.5335591 0.82772064 0.8573169 0.2829848 0.51410972 0.8562777 0.02921338 0.89868232
## 200223270003_R03C01 0.57461229 0.7667174 0.8090950 0.1848705 0.6018832 0.9027517 0.9093137 0.7946153 0.5472606 0.02690547 0.8538850 0.6220919 0.40202875 0.6023731 0.78542320 0.03145466
## 200223270003_R06C01 0.02467208 0.2264993 0.8285902 0.1701734 0.8611166 0.8525611 0.9204487 0.8124316 0.9415177 0.46030640 0.8110366 0.6384003 0.05579011 0.8778458 0.02709928 0.84677625
## cg00084271 cg16858433 cg06371647 cg26846609 cg15184869 cg13573375 cg04831745 cg22931151 cg18918831 cg07640670 cg15600437 cg01280698 cg12689021 cg27577781 cg13405878 cg22666875
## 200223270003_R02C01 0.8103611 0.9184356 0.8336894 0.48860949 0.8622328 0.8670419 0.61984995 0.9311023 0.4891660 0.58296513 0.4885353 0.8985067 0.7706828 0.8143535 0.4549662 0.8177182
## 200223270003_R03C01 0.7877006 0.9194211 0.8198684 0.04878986 0.8996252 0.1733934 0.71214149 0.9356702 0.5333801 0.55225610 0.4894487 0.8846201 0.7449475 0.8113185 0.7858042 0.8291957
## 200223270003_R06C01 0.7706165 0.9271632 0.8069537 0.48026945 0.8688117 0.8888246 0.06871768 0.9328614 0.6406575 0.04058533 0.8551374 0.8847132 0.7872237 0.8144274 0.7583938 0.3694180
## cg16536985 cg16202259 cg18857647 cg22305850 cg27224751 cg09247979 cg12333628 cg16571124 cg03979311 cg12421087 cg15700429 cg13739190 cg00819121 cg25436480 cg04768387 cg24634455
## 200223270003_R02C01 0.5789643 0.9548726 0.8582332 0.03361934 0.44503947 0.5070956 0.9227884 0.9282854 0.86644909 0.5647607 0.7879010 0.8510103 0.9207001 0.8425160 0.3131047 0.7796391
## 200223270003_R03C01 0.5418687 0.3713483 0.8394132 0.57522232 0.03214912 0.5706177 0.9092861 0.9206431 0.06199853 0.5399655 0.9114530 0.8358482 0.9281472 0.4994032 0.9465814 0.5188241
## 200223270003_R06C01 0.8392044 0.4852461 0.2647491 0.58548744 0.83123722 0.5090215 0.5084647 0.9276842 0.72615553 0.5400348 0.8838233 0.8419471 0.9327211 0.3494312 0.9098563 0.5325725
## cg11133939 cg17042243 cg22542451 cg01608425 cg06864789 cg06880438 cg13387643 cg12702014 cg03737947 cg02823329 cg00696044 cg06960717 cg20673830 cg25649515 cg10681981 cg15633912
## 200223270003_R02C01 0.1282694 0.2502905 0.5884356 0.9030410 0.05369415 0.8285145 0.4229959 0.7704049 0.91824910 0.9462397 0.55608424 0.7030978 0.2422052 0.9279829 0.7035090 0.1605530
## 200223270003_R03C01 0.5920898 0.2933475 0.8337068 0.9264388 0.46053125 0.7988881 0.4200273 0.7848681 0.92067153 0.6464005 0.07552381 0.7653402 0.6881735 0.9235753 0.7382662 0.9333421
## 200223270003_R06C01 0.5127706 0.2725457 0.8125084 0.8887753 0.87513655 0.7839538 0.4161488 0.8065993 0.03638091 0.9633930 0.79270858 0.7206218 0.2134634 0.5895839 0.6971989 0.8737362
## cg02668233 cg27272246 cg18150287 cg18339359 cg04718469 cg01933473 cg02122327 cg18993517 cg02495179 cg02356645 cg09216282 cg09584650 cg00512739 cg23352245 cg12776173 cg19301366
## 200223270003_R02C01 0.4708431 0.8615873 0.7685695 0.8824858 0.8687522 0.2589014 0.38940091 0.2091538 0.6813307 0.5105903 0.9349248 0.08230254 0.9337648 0.9377232 0.1038804 0.8831393
## 200223270003_R03C01 0.8841930 0.8705287 0.7519166 0.9040272 0.7256813 0.6726133 0.37769608 0.2665896 0.7373055 0.5833923 0.9244259 0.09661586 0.8863895 0.9375774 0.8730635 0.8072679
## 200223270003_R06C01 0.4575646 0.8103777 0.2501173 0.8552121 0.8521881 0.2642560 0.04017909 0.2574003 0.5588114 0.5701428 0.9263996 0.52399749 0.9242748 0.5932742 0.7009491 0.8796022
## cg25758034 cg04316537 cg14687298 cg13226272 cg13372276 cg12556569 cg06277607 cg17002338 cg24307368 cg14627380 cg10091792 cg08584917 cg18819889 cg24697433 cg03084184 cg23159970
## 200223270003_R02C01 0.6114028 0.8074830 0.04206702 0.02637249 0.04888111 0.06218231 0.10744587 0.9286251 0.64323677 0.9455369 0.8670733 0.5663205 0.9156157 0.9243095 0.8162981 0.61817246
## 200223270003_R03C01 0.6649219 0.8453340 0.14813581 0.54100016 0.62396373 0.03924599 0.09353494 0.2684163 0.34980461 0.9258964 0.5864221 0.9019732 0.9004455 0.6808390 0.7877128 0.57492600
## 200223270003_R06C01 0.2393844 0.4351695 0.24260002 0.44370701 0.59693465 0.48636893 0.09504696 0.2811103 0.02720398 0.5789898 0.6087997 0.9187789 0.9054439 0.6384606 0.4546397 0.03288909
## cg22112152 cg12784167 cg08198851 cg17129965 cg00939409 cg08788093 cg09451339 cg20078646 cg10788927 cg16089727 cg00146240 cg15775217 cg18526121 cg01662749 cg14192979 cg03672288
## 200223270003_R02C01 0.8476101 0.81503498 0.6578905 0.8972140 0.2652180 0.03911678 0.2243746 0.06198170 0.8973154 0.86748697 0.6336151 0.5707441 0.4519781 0.3506201 0.06336040 0.9235592
## 200223270003_R03C01 0.8014136 0.02811410 0.6578186 0.8806673 0.8882671 0.60934160 0.2340702 0.89537412 0.2021398 0.54996692 0.8957183 0.9168327 0.4762313 0.2510946 0.06019651 0.6718625
## 200223270003_R06C01 0.7897897 0.03073269 0.1272153 0.8857237 0.8842646 0.88380243 0.8921284 0.08725521 0.2053075 0.05876736 0.1433218 0.6042521 0.4833367 0.8061480 0.52114282 0.9007629
## cg25306893 cg05392160 cg05321907 cg25277809 cg05876883 cg06715136 cg06483046 cg14307563 cg14170504 cg04497611 cg24139837 cg05161773 cg05593887 cg11286989 cg10240127 cg27160885
## 200223270003_R02C01 0.6265392 0.9328933 0.2880477 0.1632342 0.9039064 0.3400192 0.04383925 0.1855966 0.54915621 0.9086359 0.07404605 0.4120912 0.5939220 0.7590008 0.9250553 0.2231606
## 200223270003_R03C01 0.8330282 0.2576881 0.1782629 0.4913711 0.9223308 0.9259109 0.50720277 0.8916957 0.02236650 0.8818513 0.04183445 0.4154907 0.5766550 0.8533989 0.9403255 0.8263885
## 200223270003_R06C01 0.6175380 0.8920726 0.8427929 0.5952124 0.4697980 0.9079807 0.89604910 0.8750052 0.02988245 0.5853116 0.05657120 0.8526849 0.9148338 0.7313884 0.9056974 0.2121179
## cg01549082 cg04412904 cg14532717 cg06118351 cg22535849 cg11706829 cg00322003 cg08554146 cg02627240 cg18029737 cg17723206 cg03549208 cg21986118 cg05850457 cg09785377 cg14293999
## 200223270003_R02C01 0.2924138 0.05088595 0.5732280 0.3633940 0.8847704 0.8897234 0.1759911 0.8982080 0.66706843 0.9100454 0.92881042 0.9014487 0.6658175 0.8183013 0.9162088 0.2836710
## 200223270003_R03C01 0.7065693 0.07717659 0.1107638 0.4714860 0.8609966 0.5444785 0.5702070 0.8963074 0.57129408 0.9016634 0.48556255 0.8381784 0.6571296 0.8313023 0.9226292 0.9172023
## 200223270003_R06C01 0.2895440 0.08253743 0.6273416 0.8655962 0.8808022 0.5669449 0.3077122 0.8213878 0.05309659 0.7376586 0.01765023 0.9097817 0.7034445 0.8161364 0.6405193 0.9168166
## cg07138269 cg15985500 cg14780448 cg04124201 cg17738613 cg17906851 cg22169467 cg22071943 cg20981163 cg10039445 cg02246922 cg08896901 cg02631626 cg11247378 cg08857872 cg00295418
## 200223270003_R02C01 0.5002290 0.8555262 0.9119141 0.8686421 0.6879612 0.9488392 0.3095010 0.8705217 0.8990628 0.8833873 0.7301201 0.3581911 0.6280766 0.1591185 0.3395280 0.44954665
## 200223270003_R03C01 0.9426707 0.8312198 0.6702102 0.3308589 0.6582258 0.9529718 0.2978585 0.2442648 0.9264076 0.8954055 0.9447019 0.2467071 0.1951736 0.7874849 0.8181845 0.48471295
## 200223270003_R06C01 0.5057781 0.8492103 0.6207355 0.3241613 0.1022257 0.6462151 0.8955853 0.2644581 0.4874651 0.8832807 0.7202230 0.9225209 0.2699849 0.4807942 0.2970779 0.02004532
## cg14507637 cg18949721 cg11187460 cg12146221 cg08041188 cg04867412 cg00345083 cg11268585 cg21388339 cg12228670 cg23916408 cg26901661 cg21243064 cg06403901 cg15730644 cg00322820
## 200223270003_R02C01 0.9051258 0.2334245 0.03672179 0.2049284 0.7752456 0.04304823 0.47960968 0.2521544 0.2756268 0.8632174 0.1942275 0.8951971 0.5191606 0.92790690 0.4803181 0.4869764
## 200223270003_R03C01 0.9009460 0.2437792 0.92516409 0.1814927 0.3201255 0.87967997 0.50833875 0.8535791 0.2102269 0.8496212 0.9154993 0.8754981 0.9167649 0.04783341 0.4353906 0.4858988
## 200223270003_R06C01 0.9013686 0.2523095 0.03109553 0.8619250 0.7900939 0.44971146 0.03929249 0.9121931 0.7649181 0.8738949 0.8886255 0.9021064 0.4862205 0.05253626 0.8763048 0.4754313
## cg04645024 cg24643105 cg03221390 cg21139150 cg17131279 cg15501526 cg13653328 cg24470466 cg23836570 cg13038195 cg04664583
## 200223270003_R02C01 0.7366541 0.5303418 0.5859063 0.01853264 0.1900637 0.6362531 0.9245434 0.7725300 0.58688450 0.45882213 0.5572814
## 200223270003_R03C01 0.8454827 0.5042688 0.9180706 0.43223243 0.7048637 0.6319253 0.5122938 0.9041432 0.54259383 0.02740132 0.5881190
## 200223270003_R06C01 0.0871902 0.9383050 0.6399867 0.43772680 0.1492861 0.7435100 0.9362798 0.1206738 0.03267304 0.46284376 0.9352717
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 251
dim(testData)
## [1] 194 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Mean_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Mean_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 120 20
## CN 8 46
##
## Accuracy : 0.8557
## 95% CI : (0.7982, 0.9019)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 5.787e-10
##
## Kappa : 0.6637
##
## Mcnemar's Test P-Value : 0.03764
##
## Sensitivity : 0.9375
## Specificity : 0.6970
## Pos Pred Value : 0.8571
## Neg Pred Value : 0.8519
## Prevalence : 0.6598
## Detection Rate : 0.6186
## Detection Prevalence : 0.7216
## Balanced Accuracy : 0.8172
##
## 'Positive' Class : CI
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Mean_LRM1_Accuracy <- cm_FeatEval_Mean_LRM1$overall["Accuracy"]
cm_FeatEval_Mean_LRM1_Kappa <- cm_FeatEval_Mean_LRM1$overall["Kappa"]
print(cm_FeatEval_Mean_LRM1_Accuracy)
## Accuracy
## 0.8556701
print(cm_FeatEval_Mean_LRM1_Kappa)
## Kappa
## 0.6636949
print(model_LRM1)
## glmnet
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001769938 0.7906471 0.5288227
## 0.10 0.0017699384 0.7862759 0.5169254
## 0.10 0.0176993845 0.7950183 0.5282824
## 0.55 0.0001769938 0.7532357 0.4484439
## 0.55 0.0017699384 0.7487912 0.4385364
## 0.55 0.0176993845 0.7268132 0.3675163
## 1.00 0.0001769938 0.7312576 0.4017938
## 1.00 0.0017699384 0.7334310 0.3991983
## 1.00 0.0176993845 0.6694994 0.2129710
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Mean_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997797356828194"
print(FeatEval_Mean_LRM1_trainAccuracy)
## [1] 0.9977974
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7483299
FeatEval_Mean_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Mean_mean_accuracy_cv_LRM1)
## [1] 0.7483299
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG ==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.8995
## [1] "The auc value is:"
## Area under the curve: 0.8995
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_LRM1_AUC <- mean_auc
}
print(FeatEval_Mean_LRM1_AUC)
## Area under the curve: 0.8995
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## PC1 98.69
## PC3 85.31
## cg09727210 70.51
## cg23432430 69.28
## cg07158503 58.72
## cg00962106 56.58
## cg06697310 56.38
## cg09015880 51.41
## cg02225060 50.11
## cg10701746 50.08
## cg16338321 48.90
## cg00819121 48.72
## cg14168080 46.30
## cg21757617 45.96
## cg00415024 45.10
## cg01910713 43.96
## cg16858433 43.85
## cg00004073 43.50
## cg05064044 43.33
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 2.575692864
## 2 2.542053072
## 3 2.197354106
## 4 1.816131146
## 5 1.784377261
## 6 1.512529211
## 7 1.457354838
## 8 1.452102351
## 9 1.324131788
## 10 1.290796056
## 11 1.289888272
## 12 1.259479369
## 13 1.254889417
## 14 1.192482649
## 15 1.183767581
## 16 1.161712332
## 17 1.132341707
## 18 1.129518398
## 19 1.120355548
## 20 1.115958347
## 21 1.089229656
## 22 1.070239210
## 23 1.063103293
## 24 1.056673418
## 25 1.022051419
## 26 1.013753244
## 27 1.013292944
## 28 0.984087268
## 29 0.968113111
## 30 0.965918978
## 31 0.957516516
## 32 0.948100067
## 33 0.925222146
## 34 0.911930614
## 35 0.908244307
## 36 0.900596205
## 37 0.896025272
## 38 0.893247114
## 39 0.892859194
## 40 0.890991158
## 41 0.871024852
## 42 0.869154610
## 43 0.858232864
## 44 0.853227137
## 45 0.844920488
## 46 0.843025564
## 47 0.841183418
## 48 0.838507743
## 49 0.838473842
## 50 0.837376903
## 51 0.832891228
## 52 0.831126903
## 53 0.829320231
## 54 0.823112068
## 55 0.813723810
## 56 0.810517499
## 57 0.808460674
## 58 0.793685869
## 59 0.783330192
## 60 0.777346379
## 61 0.775260563
## 62 0.767912888
## 63 0.762913278
## 64 0.754787040
## 65 0.750048569
## 66 0.742578777
## 67 0.741918621
## 68 0.733326484
## 69 0.731576115
## 70 0.731287648
## 71 0.728866134
## 72 0.728545425
## 73 0.713293481
## 74 0.710479412
## 75 0.704320784
## 76 0.694469169
## 77 0.693026530
## 78 0.689134681
## 79 0.687787010
## 80 0.679343444
## 81 0.678871612
## 82 0.678780995
## 83 0.677962448
## 84 0.662304720
## 85 0.660778233
## 86 0.656952198
## 87 0.635141474
## 88 0.633182780
## 89 0.630696736
## 90 0.629830475
## 91 0.624731818
## 92 0.613544147
## 93 0.611224214
## 94 0.609806445
## 95 0.609002005
## 96 0.608444817
## 97 0.605916469
## 98 0.602029916
## 99 0.599394485
## 100 0.599387927
## 101 0.598490991
## 102 0.594828101
## 103 0.588317588
## 104 0.586343910
## 105 0.582430093
## 106 0.578700589
## 107 0.577208493
## 108 0.577002200
## 109 0.575875530
## 110 0.574108522
## 111 0.566893230
## 112 0.563722174
## 113 0.559403692
## 114 0.549523775
## 115 0.536809850
## 116 0.529216227
## 117 0.526178401
## 118 0.525132755
## 119 0.519479990
## 120 0.514053496
## 121 0.509019365
## 122 0.505572975
## 123 0.497233088
## 124 0.496719757
## 125 0.492379723
## 126 0.490465569
## 127 0.485555416
## 128 0.485313840
## 129 0.479896214
## 130 0.472963971
## 131 0.472361978
## 132 0.464395942
## 133 0.463328462
## 134 0.460102158
## 135 0.443913239
## 136 0.441228788
## 137 0.439325516
## 138 0.430118866
## 139 0.425679134
## 140 0.419083943
## 141 0.413494863
## 142 0.409679903
## 143 0.408151726
## 144 0.406264206
## 145 0.404807006
## 146 0.399229737
## 147 0.394315450
## 148 0.391812918
## 149 0.389307051
## 150 0.387654146
## 151 0.384334949
## 152 0.382800467
## 153 0.382209818
## 154 0.380188174
## 155 0.372893997
## 156 0.372804811
## 157 0.366827459
## 158 0.362219231
## 159 0.358902853
## 160 0.351215340
## 161 0.347423875
## 162 0.346145125
## 163 0.346027325
## 164 0.341567796
## 165 0.339393863
## 166 0.338184377
## 167 0.330986362
## 168 0.329965958
## 169 0.328259486
## 170 0.314654888
## 171 0.311613050
## 172 0.310423928
## 173 0.307953011
## 174 0.307613316
## 175 0.306407759
## 176 0.302944819
## 177 0.295214081
## 178 0.294314377
## 179 0.292346726
## 180 0.289547825
## 181 0.289167889
## 182 0.284434318
## 183 0.284329874
## 184 0.271911647
## 185 0.267502570
## 186 0.265934703
## 187 0.259937561
## 188 0.258762515
## 189 0.253118489
## 190 0.251899874
## 191 0.251489633
## 192 0.250267143
## 193 0.246932984
## 194 0.240702418
## 195 0.237425205
## 196 0.232852446
## 197 0.231602485
## 198 0.226941316
## 199 0.226892151
## 200 0.226400272
## 201 0.206860745
## 202 0.204994848
## 203 0.204284801
## 204 0.188957830
## 205 0.187016874
## 206 0.186462873
## 207 0.185552444
## 208 0.179906013
## 209 0.175633504
## 210 0.173452301
## 211 0.165920730
## 212 0.162270794
## 213 0.147281845
## 214 0.142958178
## 215 0.138111136
## 216 0.132030882
## 217 0.128730689
## 218 0.118468385
## 219 0.118045139
## 220 0.116275901
## 221 0.113316751
## 222 0.110236805
## 223 0.101716698
## 224 0.097261815
## 225 0.092767633
## 226 0.085636463
## 227 0.082232843
## 228 0.079370146
## 229 0.077248230
## 230 0.076395110
## 231 0.041891109
## 232 0.028100076
## 233 0.023340593
## 234 0.016788164
## 235 0.015744479
## 236 0.015259323
## 237 0.008333347
## 238 0.006346333
## 239 0.002392991
## 240 0.000000000
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CI CN
## 427 221
prop.table(table(df_LRM1$DX))
##
## CI CN
## 0.6589506 0.3410494
table(trainData$DX)
##
## CI CN
## 299 155
prop.table(table(trainData$DX))
##
## CI CN
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.932127
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.929032Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 65.488, df = 1, p-value = 5.848e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 45.674, df = 1, p-value = 1.397e-11library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
# Extract the new balanced dataset
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CI CN
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 117 17
## CN 11 49
##
## Accuracy : 0.8557
## 95% CI : (0.7982, 0.9019)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 5.787e-10
##
## Kappa : 0.6713
##
## Mcnemar's Test P-Value : 0.3447
##
## Sensitivity : 0.9141
## Specificity : 0.7424
## Pos Pred Value : 0.8731
## Neg Pred Value : 0.8167
## Prevalence : 0.6598
## Detection Rate : 0.6031
## Detection Prevalence : 0.6907
## Balanced Accuracy : 0.8282
##
## 'Positive' Class : CI
##
print(model_LRM2)
## glmnet
##
## 609 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 487, 488, 487, 487, 487
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001970375 0.8932259 0.7860533
## 0.10 0.0019703751 0.8997832 0.7992033
## 0.10 0.0197037514 0.8965045 0.7925593
## 0.55 0.0001970375 0.8800975 0.7596369
## 0.55 0.0019703751 0.8784447 0.7563681
## 0.55 0.0197037514 0.8325295 0.6641846
## 1.00 0.0001970375 0.8751795 0.7498703
## 1.00 0.0019703751 0.8620512 0.7233841
## 1.00 0.0197037514 0.7783634 0.5555074
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.001970375.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8662422
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC3 100.00
## PC1 74.85
## PC2 64.52
## cg09727210 47.86
## cg23432430 42.70
## cg06697310 40.13
## cg07158503 38.56
## cg09015880 38.49
## cg01910713 37.02
## cg10701746 37.02
## cg16858433 36.92
## cg00962106 36.72
## cg02225060 35.61
## cg16338321 34.02
## cg00819121 33.92
## cg05064044 31.40
## cg14168080 31.32
## cg26081710 30.24
## cg21757617 30.22
## cg00415024 30.21
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 7.626168392
## 2 5.708243293
## 3 4.920738272
## 4 3.650035214
## 5 3.256426793
## 6 3.060475169
## 7 2.940580525
## 8 2.935360538
## 9 2.822930345
## 10 2.822921883
## 11 2.815297328
## 12 2.800146058
## 13 2.715864303
## 14 2.594264272
## 15 2.586607983
## 16 2.394954562
## 17 2.388253500
## 18 2.305929947
## 19 2.304625111
## 20 2.303512068
## 21 2.303411787
## 22 2.239381432
## 23 2.191027253
## 24 2.144171528
## 25 2.139462016
## 26 2.138163657
## 27 2.121288748
## 28 2.101727589
## 29 2.081984339
## 30 2.058053254
## 31 2.015403249
## 32 2.008987791
## 33 1.999853524
## 34 1.981211520
## 35 1.975655208
## 36 1.969560370
## 37 1.960070458
## 38 1.940851663
## 39 1.934505308
## 40 1.924351007
## 41 1.909562874
## 42 1.900324095
## 43 1.895514971
## 44 1.894786852
## 45 1.892590572
## 46 1.890537794
## 47 1.883357116
## 48 1.834133566
## 49 1.815362262
## 50 1.790727583
## 51 1.730703694
## 52 1.718165534
## 53 1.712189580
## 54 1.707771570
## 55 1.701723384
## 56 1.697446693
## 57 1.695494844
## 58 1.689082597
## 59 1.669448717
## 60 1.658338617
## 61 1.600558722
## 62 1.598974009
## 63 1.578749800
## 64 1.564887051
## 65 1.560759044
## 66 1.523519081
## 67 1.522802976
## 68 1.509677802
## 69 1.492594026
## 70 1.484026881
## 71 1.462118102
## 72 1.453843287
## 73 1.438553455
## 74 1.428372883
## 75 1.427099604
## 76 1.422520144
## 77 1.401979185
## 78 1.400992771
## 79 1.397894558
## 80 1.396066803
## 81 1.392124347
## 82 1.369271778
## 83 1.368605768
## 84 1.347182369
## 85 1.344950811
## 86 1.321308992
## 87 1.319542273
## 88 1.302973295
## 89 1.293898760
## 90 1.290742696
## 91 1.288446303
## 92 1.283642424
## 93 1.280867701
## 94 1.278363750
## 95 1.273384614
## 96 1.273100584
## 97 1.266130570
## 98 1.260698169
## 99 1.252630135
## 100 1.247124736
## 101 1.235823146
## 102 1.229137502
## 103 1.221238498
## 104 1.216918086
## 105 1.213379501
## 106 1.204925507
## 107 1.203396064
## 108 1.181953915
## 109 1.179948881
## 110 1.143071334
## 111 1.139797320
## 112 1.137407724
## 113 1.116913983
## 114 1.115263862
## 115 1.107942689
## 116 1.098907153
## 117 1.094473592
## 118 1.089794430
## 119 1.086590591
## 120 1.075530863
## 121 1.075001014
## 122 1.066347867
## 123 1.064143289
## 124 1.061362043
## 125 1.061218924
## 126 1.038590094
## 127 1.014758390
## 128 1.010861738
## 129 0.989946687
## 130 0.983789842
## 131 0.964065198
## 132 0.961677347
## 133 0.957182679
## 134 0.949257351
## 135 0.947801613
## 136 0.926208768
## 137 0.925742203
## 138 0.904714480
## 139 0.892729772
## 140 0.884805967
## 141 0.881190992
## 142 0.880665353
## 143 0.861837431
## 144 0.860971878
## 145 0.847169072
## 146 0.846823144
## 147 0.843335626
## 148 0.828071741
## 149 0.826669194
## 150 0.818128986
## 151 0.814548234
## 152 0.807648082
## 153 0.805453479
## 154 0.798459248
## 155 0.796436683
## 156 0.794302309
## 157 0.792795018
## 158 0.774522419
## 159 0.770381822
## 160 0.745178832
## 161 0.737581152
## 162 0.732578502
## 163 0.718918025
## 164 0.714122801
## 165 0.705466107
## 166 0.703991200
## 167 0.694970749
## 168 0.669560375
## 169 0.666705203
## 170 0.666357646
## 171 0.657348103
## 172 0.656930021
## 173 0.649018083
## 174 0.646581302
## 175 0.629985255
## 176 0.605894778
## 177 0.601257670
## 178 0.594414162
## 179 0.589525069
## 180 0.575042510
## 181 0.560691013
## 182 0.541858924
## 183 0.541774587
## 184 0.512922355
## 185 0.508211182
## 186 0.506800713
## 187 0.489235401
## 188 0.482734308
## 189 0.479189245
## 190 0.469714743
## 191 0.468005366
## 192 0.458488097
## 193 0.455499390
## 194 0.451919305
## 195 0.443627480
## 196 0.438673547
## 197 0.433559054
## 198 0.388621987
## 199 0.382474434
## 200 0.379896870
## 201 0.379150717
## 202 0.372914237
## 203 0.357385215
## 204 0.356748522
## 205 0.340160222
## 206 0.331313679
## 207 0.307845930
## 208 0.292537489
## 209 0.292336825
## 210 0.286160795
## 211 0.272081226
## 212 0.245229821
## 213 0.237000018
## 214 0.234363804
## 215 0.229641885
## 216 0.229124507
## 217 0.219722634
## 218 0.153025151
## 219 0.152228727
## 220 0.143965376
## 221 0.142354608
## 222 0.127745873
## 223 0.125903825
## 224 0.125840855
## 225 0.124330101
## 226 0.122835606
## 227 0.115674236
## 228 0.112953039
## 229 0.106701584
## 230 0.090331434
## 231 0.085469427
## 232 0.083409602
## 233 0.067404511
## 234 0.062023655
## 235 0.051029343
## 236 0.035999200
## 237 0.032652569
## 238 0.021676545
## 239 0.011194577
## 240 0.003032207
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.8929
## [1] "The auc value is:"
## Area under the curve: 0.8929
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8105250 0.57122481
## 0 0.05357895 0.8391697 0.62561988
## 0 0.10615789 0.8325275 0.60761462
## 0 0.15873684 0.8237118 0.58287645
## 0 0.21131579 0.8193162 0.56747468
## 0 0.26389474 0.8237363 0.57533129
## 0 0.31647368 0.8215140 0.56784456
## 0 0.36905263 0.8061050 0.52474966
## 0 0.42163158 0.8061050 0.51999323
## 0 0.47421053 0.8083272 0.52161492
## 0 0.52678947 0.8017094 0.50214775
## 0 0.57936842 0.7950672 0.47863449
## 0 0.63194737 0.7862271 0.45054427
## 0 0.68452632 0.7686203 0.39777307
## 0 0.73710526 0.7642002 0.38396262
## 0 0.78968421 0.7598046 0.36818319
## 0 0.84226316 0.7553846 0.35374295
## 0 0.89484211 0.7509890 0.34013156
## 0 0.94742105 0.7487912 0.33332759
## 0 1.00000000 0.7443956 0.31674039
## 1 0.00100000 0.7356532 0.40960120
## 1 0.05357895 0.6608059 0.03185117
## 1 0.10615789 0.6585836 0.00000000
## 1 0.15873684 0.6585836 0.00000000
## 1 0.21131579 0.6585836 0.00000000
## 1 0.26389474 0.6585836 0.00000000
## 1 0.31647368 0.6585836 0.00000000
## 1 0.36905263 0.6585836 0.00000000
## 1 0.42163158 0.6585836 0.00000000
## 1 0.47421053 0.6585836 0.00000000
## 1 0.52678947 0.6585836 0.00000000
## 1 0.57936842 0.6585836 0.00000000
## 1 0.63194737 0.6585836 0.00000000
## 1 0.68452632 0.6585836 0.00000000
## 1 0.73710526 0.6585836 0.00000000
## 1 0.78968421 0.6585836 0.00000000
## 1 0.84226316 0.6585836 0.00000000
## 1 0.89484211 0.6585836 0.00000000
## 1 0.94742105 0.6585836 0.00000000
## 1 1.00000000 0.6585836 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.05357895.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
FeatEval_Mean_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Mean_mean_accuracy_cv_ENM1)
## [1] 0.7279298
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Mean_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997797356828194"
print(FeatEval_Mean_ENM1_trainAccuracy)
## [1] 0.9977974
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Mean_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Mean_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 125 16
## CN 3 50
##
## Accuracy : 0.9021
## 95% CI : (0.8513, 0.94)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 3.867e-15
##
## Kappa : 0.7709
##
## Mcnemar's Test P-Value : 0.005905
##
## Sensitivity : 0.9766
## Specificity : 0.7576
## Pos Pred Value : 0.8865
## Neg Pred Value : 0.9434
## Prevalence : 0.6598
## Detection Rate : 0.6443
## Detection Prevalence : 0.7268
## Balanced Accuracy : 0.8671
##
## 'Positive' Class : CI
##
cm_FeatEval_Mean_ENM1_Accuracy<-cm_FeatEval_Mean_ENM1$overall["Accuracy"]
cm_FeatEval_Mean_ENM1_Kappa<-cm_FeatEval_Mean_ENM1$overall["Kappa"]
print(cm_FeatEval_Mean_ENM1_Accuracy)
## Accuracy
## 0.9020619
print(cm_FeatEval_Mean_ENM1_Kappa)
## Kappa
## 0.7709136
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## PC1 79.34
## PC3 78.32
## cg23432430 55.02
## cg09727210 53.22
## cg07158503 49.85
## cg00962106 46.66
## cg06697310 45.03
## cg02225060 42.44
## cg09015880 40.33
## cg16338321 40.04
## cg00819121 38.08
## cg10701746 37.31
## cg01910713 37.05
## cg16858433 37.01
## cg00415024 36.88
## cg05064044 36.51
## cg21757617 36.22
## cg00004073 35.95
## cg02887598 35.05
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 2.148573032
## 2 1.706255190
## 3 1.684378455
## 4 1.185546539
## 5 1.147028334
## 6 1.074895197
## 7 1.006472867
## 8 0.971538752
## 9 0.916157299
## 10 0.870961671
## 11 0.864798330
## 12 0.822762200
## 13 0.806383460
## 14 0.800730021
## 15 0.799920869
## 16 0.797131750
## 17 0.789200897
## 18 0.783039245
## 19 0.777224056
## 20 0.757963743
## 21 0.757654057
## 22 0.741221362
## 23 0.722650850
## 24 0.719600416
## 25 0.719593990
## 26 0.716897226
## 27 0.716596771
## 28 0.715139988
## 29 0.704583658
## 30 0.702722099
## 31 0.687201087
## 32 0.683237470
## 33 0.678832870
## 34 0.664738971
## 35 0.663168351
## 36 0.658379068
## 37 0.652977848
## 38 0.648571992
## 39 0.647013552
## 40 0.646885794
## 41 0.638640969
## 42 0.628953551
## 43 0.628569382
## 44 0.625829001
## 45 0.624358410
## 46 0.621757212
## 47 0.619922644
## 48 0.618846349
## 49 0.615590144
## 50 0.614635989
## 51 0.610262017
## 52 0.608370804
## 53 0.607764354
## 54 0.595844806
## 55 0.593576110
## 56 0.591552931
## 57 0.590436860
## 58 0.588727285
## 59 0.577839092
## 60 0.572716361
## 61 0.571805405
## 62 0.567158183
## 63 0.567147404
## 64 0.562036815
## 65 0.559065613
## 66 0.558731184
## 67 0.557321296
## 68 0.555952972
## 69 0.550144027
## 70 0.549539146
## 71 0.544350097
## 72 0.539355924
## 73 0.529763256
## 74 0.526875865
## 75 0.519610796
## 76 0.519136962
## 77 0.518555658
## 78 0.515654244
## 79 0.513401738
## 80 0.512762508
## 81 0.509230510
## 82 0.507349179
## 83 0.504249009
## 84 0.499870518
## 85 0.499392870
## 86 0.498436136
## 87 0.496820998
## 88 0.496007653
## 89 0.493605066
## 90 0.491310353
## 91 0.489354225
## 92 0.487713012
## 93 0.486532491
## 94 0.483883132
## 95 0.483705390
## 96 0.478883641
## 97 0.477966203
## 98 0.477815314
## 99 0.475274702
## 100 0.473161227
## 101 0.469604101
## 102 0.468422803
## 103 0.467318468
## 104 0.464734413
## 105 0.460568625
## 106 0.459861082
## 107 0.456486080
## 108 0.454668239
## 109 0.453262830
## 110 0.449705999
## 111 0.449103371
## 112 0.446970728
## 113 0.435592764
## 114 0.435027153
## 115 0.428573128
## 116 0.427583540
## 117 0.423733942
## 118 0.421902277
## 119 0.421814730
## 120 0.420474708
## 121 0.418197348
## 122 0.411284210
## 123 0.401073299
## 124 0.398743850
## 125 0.397411988
## 126 0.395033333
## 127 0.392683225
## 128 0.391178767
## 129 0.387534602
## 130 0.386542311
## 131 0.384179028
## 132 0.384102134
## 133 0.382030140
## 134 0.381902855
## 135 0.378169690
## 136 0.377263302
## 137 0.377132649
## 138 0.371634005
## 139 0.368398186
## 140 0.366497268
## 141 0.363504668
## 142 0.361838849
## 143 0.360496679
## 144 0.357656914
## 145 0.352966228
## 146 0.352804285
## 147 0.351681041
## 148 0.349543031
## 149 0.349024731
## 150 0.345054012
## 151 0.342722309
## 152 0.341165939
## 153 0.340817943
## 154 0.334699254
## 155 0.333271484
## 156 0.332975537
## 157 0.332489769
## 158 0.328688750
## 159 0.321703634
## 160 0.321661528
## 161 0.316442474
## 162 0.311634549
## 163 0.310271334
## 164 0.307872964
## 165 0.305679529
## 166 0.302278564
## 167 0.301033726
## 168 0.300355030
## 169 0.299194456
## 170 0.295412901
## 171 0.294898782
## 172 0.292337866
## 173 0.289851780
## 174 0.289459154
## 175 0.286877278
## 176 0.283357692
## 177 0.282040029
## 178 0.280061304
## 179 0.278014895
## 180 0.277986002
## 181 0.276571766
## 182 0.276176234
## 183 0.275849320
## 184 0.271796964
## 185 0.268943012
## 186 0.268911908
## 187 0.264284175
## 188 0.258934498
## 189 0.256978382
## 190 0.253785184
## 191 0.249919869
## 192 0.249144086
## 193 0.248675163
## 194 0.246344885
## 195 0.244535615
## 196 0.243861435
## 197 0.242943459
## 198 0.239778197
## 199 0.237421815
## 200 0.237150925
## 201 0.237074649
## 202 0.236593382
## 203 0.235795904
## 204 0.234473298
## 205 0.232115012
## 206 0.231606206
## 207 0.226628054
## 208 0.226010276
## 209 0.224179919
## 210 0.220374418
## 211 0.219554872
## 212 0.219066550
## 213 0.213380728
## 214 0.211606806
## 215 0.204601025
## 216 0.201241155
## 217 0.199903557
## 218 0.195404833
## 219 0.192302883
## 220 0.185031975
## 221 0.184858527
## 222 0.184705139
## 223 0.182305629
## 224 0.173781462
## 225 0.166916157
## 226 0.162004347
## 227 0.156089912
## 228 0.147563454
## 229 0.143781473
## 230 0.141201310
## 231 0.137864868
## 232 0.137587535
## 233 0.133442479
## 234 0.129529901
## 235 0.124427636
## 236 0.120138693
## 237 0.105466343
## 238 0.096470225
## 239 0.096441174
## 240 0.093914108
## 241 0.092653859
## 242 0.084564737
## 243 0.084119258
## 244 0.071443693
## 245 0.048668501
## 246 0.029471168
## 247 0.013715935
## 248 0.011671792
## 249 0.009043756
## 250 0.007478005
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.9157
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_ENM1_AUC <- mean_auc
}
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6365812 0.10788567
## 0.3 1 0.6 0.50 100 0.6674481 0.21995123
## 0.3 1 0.6 0.50 150 0.6849573 0.25290695
## 0.3 1 0.6 0.75 50 0.6189255 0.04382282
## 0.3 1 0.6 0.75 100 0.6476190 0.13266136
## 0.3 1 0.6 0.75 150 0.6586325 0.17109323
## 0.3 1 0.6 1.00 50 0.6101099 -0.02231134
## 0.3 1 0.6 1.00 100 0.6344078 0.06988703
## 0.3 1 0.6 1.00 150 0.6342857 0.07373553
## 0.3 1 0.8 0.50 50 0.6189499 0.06631012
## 0.3 1 0.8 0.50 100 0.6410256 0.15370968
## 0.3 1 0.8 0.50 150 0.6828571 0.25808227
## 0.3 1 0.8 0.75 50 0.6057631 0.01027818
## 0.3 1 0.8 0.75 100 0.6409768 0.12634940
## 0.3 1 0.8 0.75 150 0.6542125 0.16304486
## 0.3 1 0.8 1.00 50 0.5836874 -0.08348795
## 0.3 1 0.8 1.00 100 0.6321612 0.05948269
## 0.3 1 0.8 1.00 150 0.6365568 0.08324654
## 0.3 2 0.6 0.50 50 0.6522100 0.15464981
## 0.3 2 0.6 0.50 100 0.6874237 0.22369222
## 0.3 2 0.6 0.50 150 0.6676190 0.17752609
## 0.3 2 0.6 0.75 50 0.6320879 0.08757399
## 0.3 2 0.6 0.75 100 0.6607570 0.15348443
## 0.3 2 0.6 0.75 150 0.6673993 0.15993719
## 0.3 2 0.6 1.00 50 0.6189988 0.04315657
## 0.3 2 0.6 1.00 100 0.6255922 0.06107384
## 0.3 2 0.6 1.00 150 0.6344567 0.07882187
## 0.3 2 0.8 0.50 50 0.6299145 0.09854096
## 0.3 2 0.8 0.50 100 0.6762882 0.21190122
## 0.3 2 0.8 0.50 150 0.6784371 0.22950354
## 0.3 2 0.8 0.75 50 0.6586569 0.13460074
## 0.3 2 0.8 0.75 100 0.6585348 0.14386674
## 0.3 2 0.8 0.75 150 0.6828083 0.20497846
## 0.3 2 0.8 1.00 50 0.6366300 0.05173251
## 0.3 2 0.8 1.00 100 0.6543101 0.10006168
## 0.3 2 0.8 1.00 150 0.6520391 0.09450595
## 0.3 3 0.6 0.50 50 0.6740659 0.20259786
## 0.3 3 0.6 0.50 100 0.6807326 0.20698701
## 0.3 3 0.6 0.50 150 0.6872772 0.21617104
## 0.3 3 0.6 0.75 50 0.6651282 0.14713367
## 0.3 3 0.6 0.75 100 0.6607326 0.13947132
## 0.3 3 0.6 0.75 150 0.6607326 0.14505460
## 0.3 3 0.6 1.00 50 0.6784371 0.16219525
## 0.3 3 0.6 1.00 100 0.6717705 0.14372236
## 0.3 3 0.6 1.00 150 0.6717949 0.14313190
## 0.3 3 0.8 0.50 50 0.6959951 0.23440527
## 0.3 3 0.8 0.50 100 0.7048840 0.26749494
## 0.3 3 0.8 0.50 150 0.6937973 0.24051736
## 0.3 3 0.8 0.75 50 0.6762149 0.18476119
## 0.3 3 0.8 0.75 100 0.6740415 0.17795943
## 0.3 3 0.8 0.75 150 0.6807082 0.19516713
## 0.3 3 0.8 1.00 50 0.6606838 0.10947103
## 0.3 3 0.8 1.00 100 0.6651038 0.13783716
## 0.3 3 0.8 1.00 150 0.6651526 0.13675649
## 0.4 1 0.6 0.50 50 0.5902564 0.01434726
## 0.4 1 0.6 0.50 100 0.6343101 0.13629136
## 0.4 1 0.6 0.50 150 0.6454457 0.17359979
## 0.4 1 0.6 0.75 50 0.6012210 0.02018018
## 0.4 1 0.6 0.75 100 0.6453480 0.16060090
## 0.4 1 0.6 0.75 150 0.6562882 0.18007152
## 0.4 1 0.6 1.00 50 0.5945543 -0.01020432
## 0.4 1 0.6 1.00 100 0.6144811 0.05431029
## 0.4 1 0.6 1.00 150 0.6276923 0.09430014
## 0.4 1 0.8 0.50 50 0.6167033 0.07147824
## 0.4 1 0.8 0.50 100 0.6277656 0.10948056
## 0.4 1 0.8 0.50 150 0.6651282 0.20264658
## 0.4 1 0.8 0.75 50 0.6232967 0.08840062
## 0.4 1 0.8 0.75 100 0.6299634 0.10661153
## 0.4 1 0.8 0.75 150 0.6388767 0.13715347
## 0.4 1 0.8 1.00 50 0.6101343 0.01054855
## 0.4 1 0.8 1.00 100 0.6233455 0.07458927
## 0.4 1 0.8 1.00 150 0.6255189 0.07350157
## 0.4 2 0.6 0.50 50 0.6387057 0.13305782
## 0.4 2 0.6 0.50 100 0.6587302 0.18893277
## 0.4 2 0.6 0.50 150 0.6631258 0.19956340
## 0.4 2 0.6 0.75 50 0.6342857 0.11592014
## 0.4 2 0.6 0.75 100 0.6585348 0.16608439
## 0.4 2 0.6 0.75 150 0.6608059 0.16909425
## 0.4 2 0.6 1.00 50 0.6388278 0.09655057
## 0.4 2 0.6 1.00 100 0.6695726 0.16570314
## 0.4 2 0.6 1.00 150 0.6717460 0.16319353
## 0.4 2 0.8 0.50 50 0.6454457 0.14740821
## 0.4 2 0.8 0.50 100 0.6763370 0.21877314
## 0.4 2 0.8 0.50 150 0.6851526 0.23453710
## 0.4 2 0.8 0.75 50 0.6520147 0.14830491
## 0.4 2 0.8 0.75 100 0.6629548 0.18380401
## 0.4 2 0.8 0.75 150 0.6783639 0.21450927
## 0.4 2 0.8 1.00 50 0.6256899 0.06924968
## 0.4 2 0.8 1.00 100 0.6476679 0.11665842
## 0.4 2 0.8 1.00 150 0.6564591 0.14400806
## 0.4 3 0.6 0.50 50 0.6651526 0.19458545
## 0.4 3 0.6 0.50 100 0.6717460 0.20483072
## 0.4 3 0.6 0.50 150 0.6937485 0.25406111
## 0.4 3 0.6 0.75 50 0.6674481 0.15965126
## 0.4 3 0.6 0.75 100 0.6763370 0.17411914
## 0.4 3 0.6 0.75 150 0.6807326 0.18571779
## 0.4 3 0.6 1.00 50 0.6564103 0.12427286
## 0.4 3 0.6 1.00 100 0.6431990 0.11344816
## 0.4 3 0.6 1.00 150 0.6476190 0.12390111
## 0.4 3 0.8 0.50 50 0.6651526 0.19987697
## 0.4 3 0.8 0.50 100 0.6827839 0.24242412
## 0.4 3 0.8 0.50 150 0.6761661 0.23503793
## 0.4 3 0.8 0.75 50 0.6564591 0.12325398
## 0.4 3 0.8 0.75 100 0.6718926 0.16716183
## 0.4 3 0.8 0.75 150 0.6850061 0.21132636
## 0.4 3 0.8 1.00 50 0.6475702 0.10075691
## 0.4 3 0.8 1.00 100 0.6564103 0.11679435
## 0.4 3 0.8 1.00 150 0.6497680 0.09859060
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.652953
FeatEval_Mean_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Mean_mean_accuracy_cv_xgb)
## [1] 0.652953
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Mean_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Mean_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Mean_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Mean_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 112 44
## CN 16 22
##
## Accuracy : 0.6907
## 95% CI : (0.6205, 0.755)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.2030127
##
## Kappa : 0.2322
##
## Mcnemar's Test P-Value : 0.0004909
##
## Sensitivity : 0.8750
## Specificity : 0.3333
## Pos Pred Value : 0.7179
## Neg Pred Value : 0.5789
## Prevalence : 0.6598
## Detection Rate : 0.5773
## Detection Prevalence : 0.8041
## Balanced Accuracy : 0.6042
##
## 'Positive' Class : CI
##
cm_FeatEval_Mean_xgb_Accuracy <-cm_FeatEval_Mean_xgb$overall["Accuracy"]
cm_FeatEval_Mean_xgb_Kappa <-cm_FeatEval_Mean_xgb$overall["Kappa"]
print(cm_FeatEval_Mean_xgb_Accuracy)
## Accuracy
## 0.6907216
print(cm_FeatEval_Mean_xgb_Kappa)
## Kappa
## 0.23219
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## cg03749159 100.00
## age.now 76.15
## cg11331837 70.86
## cg22666875 67.12
## cg02631626 64.79
## cg11019791 59.38
## cg16779438 53.84
## cg04124201 53.50
## cg10240127 52.94
## cg08041188 51.40
## cg12689021 49.32
## cg14168080 48.41
## cg06864789 46.19
## cg23432430 43.97
## cg01008088 43.78
## cg25436480 41.91
## cg26846609 41.47
## cg16431720 41.28
## cg08745107 38.83
## cg16338321 38.03
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: cg03749159 2.954501e-02 0.0147704384 0.010548523 2.954501e-02
## 2: age.now 2.249708e-02 0.0166658714 0.014767932 2.249708e-02
## 3: cg11331837 2.093539e-02 0.0228889338 0.014767932 2.093539e-02
## 4: cg22666875 1.983190e-02 0.0106072872 0.006329114 1.983190e-02
## 5: cg02631626 1.914240e-02 0.0154894087 0.008438819 1.914240e-02
## ---
## 211: cg14710850 1.227445e-04 0.0004908317 0.002109705 1.227445e-04
## 212: cg08896901 8.861563e-05 0.0005177839 0.002109705 8.861563e-05
## 213: cg04664583 8.729897e-05 0.0004251376 0.002109705 8.729897e-05
## 214: cg11706829 6.295243e-05 0.0004132903 0.002109705 6.295243e-05
## 215: cg09451339 1.458784e-05 0.0005991367 0.002109705 1.458784e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7398
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_xgb_AUC <- mean_auc
}
print(FeatEval_Mean_xgb_AUC)
## Area under the curve: 0.7398
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6608059 0.008374734
## 126 0.6695971 0.060559327
## 250 0.6674725 0.056471002
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 126.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6659585
FeatEval_Mean_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Mean_mean_accuracy_cv_rf)
## [1] 0.6659585
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Mean_rf_trainAccuracy<-train_accuracy
print(FeatEval_Mean_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Mean_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Mean_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 125 61
## CN 3 5
##
## Accuracy : 0.6701
## 95% CI : (0.5991, 0.7358)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.4131
##
## Kappa : 0.0665
##
## Mcnemar's Test P-Value : 1.041e-12
##
## Sensitivity : 0.97656
## Specificity : 0.07576
## Pos Pred Value : 0.67204
## Neg Pred Value : 0.62500
## Prevalence : 0.65979
## Detection Rate : 0.64433
## Detection Prevalence : 0.95876
## Balanced Accuracy : 0.52616
##
## 'Positive' Class : CI
##
cm_FeatEval_Mean_rf_Accuracy<-cm_FeatEval_Mean_rf$overall["Accuracy"]
print(cm_FeatEval_Mean_rf_Accuracy)
## Accuracy
## 0.6701031
cm_FeatEval_Mean_rf_Kappa<-cm_FeatEval_Mean_rf$overall["Kappa"]
print(cm_FeatEval_Mean_rf_Kappa)
## Kappa
## 0.06646617
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 250)
##
## Importance
## cg03749159 100.00
## cg23432430 89.91
## cg01008088 75.78
## cg13405878 73.11
## cg21697769 69.34
## cg06277607 64.80
## cg03982462 64.33
## cg05234269 62.11
## cg06864789 61.78
## cg06697310 61.78
## cg25712921 60.38
## cg11331837 59.36
## cg00696044 58.88
## cg00415024 58.48
## cg03600007 57.05
## cg14170504 56.16
## cg12689021 56.07
## cg01128042 56.01
## cg02887598 54.71
## cg23836570 54.65
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
## CI CN
## 1 4.691019510 4.691019510
## 2 4.028808935 4.028808935
## 3 3.102179448 3.102179448
## 4 2.926732215 2.926732215
## 5 2.679101610 2.679101610
## 6 2.381321584 2.381321584
## 7 2.350913814 2.350913814
## 8 2.204844334 2.204844334
## 9 2.183321051 2.183321051
## 10 2.183150451 2.183150451
## 11 2.091568506 2.091568506
## 12 2.024798093 2.024798093
## 13 1.992771795 1.992771795
## 14 1.967044028 1.967044028
## 15 1.872747232 1.872747232
## 16 1.814295674 1.814295674
## 17 1.808787131 1.808787131
## 18 1.804489014 1.804489014
## 19 1.719304698 1.719304698
## 20 1.715485181 1.715485181
## 21 1.658810535 1.658810535
## 22 1.658672594 1.658672594
## 23 1.624858034 1.624858034
## 24 1.581486484 1.581486484
## 25 1.574274583 1.574274583
## 26 1.515414262 1.515414262
## 27 1.486828449 1.486828449
## 28 1.457594932 1.457594932
## 29 1.433359528 1.433359528
## 30 1.408043759 1.408043759
## 31 1.407844251 1.407844251
## 32 1.391281529 1.391281529
## 33 1.388770192 1.388770192
## 34 1.374621205 1.374621205
## 35 1.307880163 1.307880163
## 36 1.298346818 1.298346818
## 37 1.273449520 1.273449520
## 38 1.267661705 1.267661705
## 39 1.260631829 1.260631829
## 40 1.240401942 1.240401942
## 41 1.240357967 1.240357967
## 42 1.239896836 1.239896836
## 43 1.221041947 1.221041947
## 44 1.212352067 1.212352067
## 45 1.209867238 1.209867238
## 46 1.206446728 1.206446728
## 47 1.199143304 1.199143304
## 48 1.127786453 1.127786453
## 49 1.122618389 1.122618389
## 50 1.084869564 1.084869564
## 51 1.075211083 1.075211083
## 52 1.067172194 1.067172194
## 53 1.062399822 1.062399822
## 54 1.061635693 1.061635693
## 55 1.046407185 1.046407185
## 56 1.045671192 1.045671192
## 57 1.044606121 1.044606121
## 58 0.997875392 0.997875392
## 59 0.996833866 0.996833866
## 60 0.956989206 0.956989206
## 61 0.952176697 0.952176697
## 62 0.932176930 0.932176930
## 63 0.914642819 0.914642819
## 64 0.910573484 0.910573484
## 65 0.905801428 0.905801428
## 66 0.902807042 0.902807042
## 67 0.885801204 0.885801204
## 68 0.829960278 0.829960278
## 69 0.829469109 0.829469109
## 70 0.825387963 0.825387963
## 71 0.823263231 0.823263231
## 72 0.819960585 0.819960585
## 73 0.806212763 0.806212763
## 74 0.792639723 0.792639723
## 75 0.763302863 0.763302863
## 76 0.760142897 0.760142897
## 77 0.755002618 0.755002618
## 78 0.747326775 0.747326775
## 79 0.744232613 0.744232613
## 80 0.728246273 0.728246273
## 81 0.723669497 0.723669497
## 82 0.702216614 0.702216614
## 83 0.690062583 0.690062583
## 84 0.669937810 0.669937810
## 85 0.667490348 0.667490348
## 86 0.661027657 0.661027657
## 87 0.643138656 0.643138656
## 88 0.618211617 0.618211617
## 89 0.610817313 0.610817313
## 90 0.589197017 0.589197017
## 91 0.587366311 0.587366311
## 92 0.583663095 0.583663095
## 93 0.577949425 0.577949425
## 94 0.573046753 0.573046753
## 95 0.566433267 0.566433267
## 96 0.565587930 0.565587930
## 97 0.559578951 0.559578951
## 98 0.539798469 0.539798469
## 99 0.537946590 0.537946590
## 100 0.537551701 0.537551701
## 101 0.532662655 0.532662655
## 102 0.509958624 0.509958624
## 103 0.509054140 0.509054140
## 104 0.505455896 0.505455896
## 105 0.492511771 0.492511771
## 106 0.485817392 0.485817392
## 107 0.478387827 0.478387827
## 108 0.469852908 0.469852908
## 109 0.444093087 0.444093087
## 110 0.437987059 0.437987059
## 111 0.415302206 0.415302206
## 112 0.411021176 0.411021176
## 113 0.408269264 0.408269264
## 114 0.406880426 0.406880426
## 115 0.400542338 0.400542338
## 116 0.390068634 0.390068634
## 117 0.387685469 0.387685469
## 118 0.387481734 0.387481734
## 119 0.361215193 0.361215193
## 120 0.358153501 0.358153501
## 121 0.307375469 0.307375469
## 122 0.293666777 0.293666777
## 123 0.291403764 0.291403764
## 124 0.288471183 0.288471183
## 125 0.286170617 0.286170617
## 126 0.239529380 0.239529380
## 127 0.233755630 0.233755630
## 128 0.221776182 0.221776182
## 129 0.213188110 0.213188110
## 130 0.211372756 0.211372756
## 131 0.211217519 0.211217519
## 132 0.186761192 0.186761192
## 133 0.183171051 0.183171051
## 134 0.180132246 0.180132246
## 135 0.172914035 0.172914035
## 136 0.166987894 0.166987894
## 137 0.146918622 0.146918622
## 138 0.145063005 0.145063005
## 139 0.130793268 0.130793268
## 140 0.125674644 0.125674644
## 141 0.115258990 0.115258990
## 142 0.095195147 0.095195147
## 143 0.091998462 0.091998462
## 144 0.079423605 0.079423605
## 145 0.075833464 0.075833464
## 146 0.022686743 0.022686743
## 147 0.022620794 0.022620794
## 148 0.010325866 0.010325866
## 149 0.010251760 0.010251760
## 150 0.006222024 0.006222024
## 151 -0.010042328 -0.010042328
## 152 -0.013721541 -0.013721541
## 153 -0.017831099 -0.017831099
## 154 -0.021750211 -0.021750211
## 155 -0.034440133 -0.034440133
## 156 -0.037123484 -0.037123484
## 157 -0.041679024 -0.041679024
## 158 -0.063094435 -0.063094435
## 159 -0.072614594 -0.072614594
## 160 -0.075305781 -0.075305781
## 161 -0.085821194 -0.085821194
## 162 -0.087130797 -0.087130797
## 163 -0.106351770 -0.106351770
## 164 -0.127435365 -0.127435365
## 165 -0.134464680 -0.134464680
## 166 -0.138486499 -0.138486499
## 167 -0.139002675 -0.139002675
## 168 -0.145785185 -0.145785185
## 169 -0.162462431 -0.162462431
## 170 -0.178756896 -0.178756896
## 171 -0.184014338 -0.184014338
## 172 -0.190219175 -0.190219175
## 173 -0.202656700 -0.202656700
## 174 -0.202926904 -0.202926904
## 175 -0.208182596 -0.208182596
## 176 -0.220296270 -0.220296270
## 177 -0.228990586 -0.228990586
## 178 -0.242476288 -0.242476288
## 179 -0.242590459 -0.242590459
## 180 -0.242651846 -0.242651846
## 181 -0.263032058 -0.263032058
## 182 -0.263742708 -0.263742708
## 183 -0.266392861 -0.266392861
## 184 -0.266558985 -0.266558985
## 185 -0.279512410 -0.279512410
## 186 -0.281883321 -0.281883321
## 187 -0.282928967 -0.282928967
## 188 -0.285260165 -0.285260165
## 189 -0.291269033 -0.291269033
## 190 -0.315359457 -0.315359457
## 191 -0.320776884 -0.320776884
## 192 -0.327496934 -0.327496934
## 193 -0.342314922 -0.342314922
## 194 -0.342657697 -0.342657697
## 195 -0.366684407 -0.366684407
## 196 -0.378203882 -0.378203882
## 197 -0.379477040 -0.379477040
## 198 -0.380345752 -0.380345752
## 199 -0.382976411 -0.382976411
## 200 -0.399995147 -0.399995147
## 201 -0.401528713 -0.401528713
## 202 -0.413727267 -0.413727267
## 203 -0.413768746 -0.413768746
## 204 -0.415569821 -0.415569821
## 205 -0.433159928 -0.433159928
## 206 -0.434873841 -0.434873841
## 207 -0.434924994 -0.434924994
## 208 -0.438649070 -0.438649070
## 209 -0.442510387 -0.442510387
## 210 -0.463284299 -0.463284299
## 211 -0.464975169 -0.464975169
## 212 -0.466686909 -0.466686909
## 213 -0.468245821 -0.468245821
## 214 -0.469150518 -0.469150518
## 215 -0.473315789 -0.473315789
## 216 -0.483654926 -0.483654926
## 217 -0.509985886 -0.509985886
## 218 -0.510552889 -0.510552889
## 219 -0.511859764 -0.511859764
## 220 -0.529420048 -0.529420048
## 221 -0.531878691 -0.531878691
## 222 -0.536198581 -0.536198581
## 223 -0.542616629 -0.542616629
## 224 -0.548428047 -0.548428047
## 225 -0.551850209 -0.551850209
## 226 -0.564901552 -0.564901552
## 227 -0.605042902 -0.605042902
## 228 -0.632975363 -0.632975363
## 229 -0.653885365 -0.653885365
## 230 -0.704532548 -0.704532548
## 231 -0.714469499 -0.714469499
## 232 -0.747202985 -0.747202985
## 233 -0.797858712 -0.797858712
## 234 -0.816629515 -0.816629515
## 235 -0.873672977 -0.873672977
## 236 -0.876482677 -0.876482677
## 237 -0.879192661 -0.879192661
## 238 -0.915345179 -0.915345179
## 239 -0.925637731 -0.925637731
## 240 -1.015511412 -1.015511412
## 241 -1.080635139 -1.080635139
## 242 -1.184034820 -1.184034820
## 243 -1.214699287 -1.214699287
## 244 -1.335206678 -1.335206678
## 245 -1.459201747 -1.459201747
## 246 -1.505647738 -1.505647738
## 247 -1.559796853 -1.559796853
## 248 -1.668414703 -1.668414703
## 249 -1.810593575 -1.810593575
## 250 -1.870262044 -1.870262044
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7016
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_rf_AUC <- mean_auc
}
print(FeatEval_Mean_rf_AUC)
## Area under the curve: 0.7016
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 364, 363, 363
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8259829 0.6256205
## 0.50 0.8215629 0.6175088
## 1.00 0.8304029 0.6320648
##
## Tuning parameter 'sigma' was held constant at a value of 0.002022917
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002022917 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.002022917 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8259829
FeatEval_Mean_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Mean_mean_accuracy_cv_svm)
## [1] 0.8259829
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.980176211453744"
FeatEval_Mean_svm_trainAccuracy <- train_accuracy
print(FeatEval_Mean_svm_trainAccuracy)
## [1] 0.9801762
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Mean_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Mean_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 111 12
## CN 17 54
##
## Accuracy : 0.8505
## 95% CI : (0.7924, 0.8975)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.742e-09
##
## Kappa : 0.673
##
## Mcnemar's Test P-Value : 0.4576
##
## Sensitivity : 0.8672
## Specificity : 0.8182
## Pos Pred Value : 0.9024
## Neg Pred Value : 0.7606
## Prevalence : 0.6598
## Detection Rate : 0.5722
## Detection Prevalence : 0.6340
## Balanced Accuracy : 0.8427
##
## 'Positive' Class : CI
##
cm_FeatEval_Mean_svm_Accuracy <- cm_FeatEval_Mean_svm$overall["Accuracy"]
cm_FeatEval_Mean_svm_Kappa <- cm_FeatEval_Mean_svm$overall["Kappa"]
print(cm_FeatEval_Mean_svm_Accuracy)
## Accuracy
## 0.8505155
print(cm_FeatEval_Mean_svm_Kappa)
## Kappa
## 0.673021
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 251 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg26081710 1.0315789 1.105263 1.126316 0.06481481
## 2 cg11227702 1.0368421 1.078947 1.105263 0.06327160
## 3 cg15535896 0.9684211 1.078947 1.121053 0.06327160
## 4 cg27160885 1.0315789 1.078947 1.105263 0.06327160
## 5 cg07158503 1.0263158 1.052632 1.073684 0.06172840
## 6 cg11331837 0.9578947 1.052632 1.126316 0.06172840
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9421
## [1] "The auc vlue is:"
## Area under the curve: 0.9421
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_svm_AUC <- mean_auc
}
print(FeatEval_Mean_svm_AUC)
## Area under the curve: 0.9421
Performance of the selected output features based on Median
processed_dataFrame<-df_selected_Median
processed_data<-output_median_feature
AfterProcess_FeatureName<-Selected_median_imp_Name
print(head(output_median_feature))
## # A tibble: 6 × 251
## DX cg23432430 PC3 age.now PC1 PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880 cg19799454 cg00004073
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CI 0.948 -0.0140 82.4 -0.214 0.0147 0.578 0.912 0.748 0.845 0.419 0.869 0.683 0.712 0.480 0.844 0.510 0.918 0.0293
## 2 CN 0.946 0.00506 78.6 -0.173 0.0575 0.620 0.538 0.825 0.865 0.442 0.516 0.827 0.685 0.487 0.855 0.840 0.911 0.0279
## 3 CN 0.942 0.0291 80.4 -0.00367 0.0837 0.624 0.504 0.818 0.241 0.436 0.903 0.521 0.721 0.493 0.779 0.847 0.907 0.646
## 4 CI 0.943 -0.0323 78.2 -0.187 -0.0112 0.599 0.904 0.758 0.848 0.957 0.531 0.808 0.187 0.855 0.826 0.487 0.922 0.624
## 5 CI 0.946 0.0529 62.9 0.0268 0.0000165 0.631 0.896 0.826 0.821 0.946 0.926 0.608 0.235 0.488 0.330 0.889 0.914 0.412
## 6 CN 0.951 -0.00869 80.7 -0.0379 0.0157 0.615 0.886 0.210 0.784 0.399 0.894 0.764 0.730 0.842 0.854 0.906 0.921 0.393
## # ℹ 232 more variables: cg00154902 <dbl>, cg02887598 <dbl>, cg09727210 <dbl>, cg11227702 <dbl>, cg11331837 <dbl>, cg16338321 <dbl>, cg24851651 <dbl>, cg25208881 <dbl>, cg19503462 <dbl>,
## # cg03749159 <dbl>, cg03088219 <dbl>, cg26081710 <dbl>, cg09120722 <dbl>, cg11787167 <dbl>, cg12543766 <dbl>, cg19471911 <dbl>, cg11540596 <dbl>, cg01921484 <dbl>, cg00415024 <dbl>,
## # cg12689021 <dbl>, cg21757617 <dbl>, cg01128042 <dbl>, cg17002719 <dbl>, cg16715186 <dbl>, cg05234269 <dbl>, cg12421087 <dbl>, cg05064044 <dbl>, cg15184869 <dbl>, cg23517115 <dbl>,
## # cg00819121 <dbl>, cg11019791 <dbl>, cg04156077 <dbl>, cg01910713 <dbl>, cg16779438 <dbl>, cg25169289 <dbl>, cg03979311 <dbl>, cg14710850 <dbl>, cg00648024 <dbl>, cg25712921 <dbl>,
## # cg27272246 <dbl>, cg18816397 <dbl>, cg18285382 <dbl>, cg08096656 <dbl>, cg15535896 <dbl>, cg13573375 <dbl>, cg20673830 <dbl>, cg26853071 <dbl>, cg15600437 <dbl>, cg16431720 <dbl>,
## # cg25436480 <dbl>, cg27577781 <dbl>, cg06277607 <dbl>, cg08745107 <dbl>, cg03982462 <dbl>, cg25879395 <dbl>, cg20823859 <dbl>, cg06960717 <dbl>, cg06961873 <dbl>, cg10738648 <dbl>,
## # cg20685672 <dbl>, cg09584650 <dbl>, cg07640670 <dbl>, cg12702014 <dbl>, cg16858433 <dbl>, cg00512739 <dbl>, cg15098922 <dbl>, cg26679884 <dbl>, cg16536985 <dbl>, cg24883219 <dbl>, …
print(Selected_median_imp_Name)
## [1] "cg23432430" "PC3" "age.now" "PC1" "PC2" "cg07158503" "cg00962106" "cg07634717" "cg06697310" "cg14168080" "cg03660162" "cg02225060" "cg07504457" "cg10701746" "cg20678988"
## [16] "cg09015880" "cg19799454" "cg00004073" "cg00154902" "cg02887598" "cg09727210" "cg11227702" "cg11331837" "cg16338321" "cg24851651" "cg25208881" "cg19503462" "cg03749159" "cg03088219" "cg26081710"
## [31] "cg09120722" "cg11787167" "cg12543766" "cg19471911" "cg11540596" "cg01921484" "cg00415024" "cg12689021" "cg21757617" "cg01128042" "cg17002719" "cg16715186" "cg05234269" "cg12421087" "cg05064044"
## [46] "cg15184869" "cg23517115" "cg00819121" "cg11019791" "cg04156077" "cg01910713" "cg16779438" "cg25169289" "cg03979311" "cg14710850" "cg00648024" "cg25712921" "cg27272246" "cg18816397" "cg18285382"
## [61] "cg08096656" "cg15535896" "cg13573375" "cg20673830" "cg26853071" "cg15600437" "cg16431720" "cg25436480" "cg27577781" "cg06277607" "cg08745107" "cg03982462" "cg25879395" "cg20823859" "cg06960717"
## [76] "cg06961873" "cg10738648" "cg20685672" "cg09584650" "cg07640670" "cg12702014" "cg16858433" "cg00512739" "cg15098922" "cg26679884" "cg16536985" "cg24883219" "cg05876883" "cg06371647" "cg02823329"
## [91] "cg12556569" "cg22666875" "cg13387643" "cg09216282" "cg02078724" "cg15700429" "cg17429539" "cg08584917" "cg01608425" "cg08788093" "cg22542451" "cg00084271" "cg21697769" "cg05593887" "cg18918831"
## [106] "cg08198851" "cg22931151" "cg18857647" "cg18150287" "cg00939409" "cg01008088" "cg17723206" "cg05321907" "cg12776173" "cg02932958" "cg09247979" "cg14170504" "cg25306893" "cg25758034" "cg25649515"
## [121] "cg22305850" "cg13405878" "cg14687298" "cg12240569" "cg19301366" "cg05161773" "cg11133939" "cg01933473" "cg26983017" "cg24697433" "cg18993517" "cg02122327" "cg11706829" "cg17906851" "cg17386240"
## [136] "cg15633912" "cg16571124" "cg03549208" "cg02495179" "cg06880438" "cg10681981" "cg13739190" "cg09785377" "cg11438323" "cg22071943" "cg26846609" "cg24634455" "cg01280698" "cg06833284" "cg02668233"
## [151] "cg04831745" "cg00322003" "cg01662749" "cg24307368" "cg04497611" "cg00146240" "cg00696044" "cg02627240" "cg03672288" "cg03737947" "cg04316537" "cg06118351" "cg06403901" "cg06483046" "cg06864789"
## [166] "cg07138269" "cg08554146" "cg08857872" "cg10240127" "cg11187460" "cg11286989" "cg11314779" "cg12228670" "cg13372276" "cg13653328" "cg14293999" "cg14532717" "cg14780448" "cg15730644" "cg15985500"
## [181] "cg17002338" "cg17042243" "cg17738613" "cg18819889" "cg18949721" "cg21986118" "cg23066280" "cg23916408" "cg24139837" "cg25277809" "cg27160885" "cg05392160" "cg02631626" "cg23352245" "cg21139150"
## [196] "cg04124201" "cg10666341" "cg18339359" "cg22169467" "cg04888234" "cg25059696" "cg06715136" "cg03600007" "cg10091792" "cg14192979" "cg20078646" "cg27224751" "cg04412904" "cg17129965" "cg14507637"
## [211] "cg14307563" "cg20981163" "cg22535849" "cg18029737" "cg14627380" "cg10788927" "cg08041188" "cg13226272" "cg11247378" "cg02772171" "cg04462915" "cg03221390" "cg22112152" "cg04664583" "cg20803293"
## [226] "cg09451339" "cg16733676" "cg22741595" "cg04242342" "cg00295418" "cg06012903" "cg00345083" "cg10039445" "cg13368637" "cg04718469" "cg16089727" "cg06231502" "cg02550738" "cg05850457" "cg08896901"
## [241] "cg17268094" "cg01549082" "cg12146221" "cg06394820" "cg26901661" "cg12784167" "cg13815695" "cg01462799" "cg00322820" "cg02356645"
print(head(df_selected_Median))
## DX cg23432430 PC3 age.now PC1 PC2 cg07158503 cg00962106 cg07634717 cg06697310 cg14168080 cg03660162 cg02225060 cg07504457 cg10701746 cg20678988 cg09015880
## 200223270003_R02C01 CI 0.9482702 -0.014043316 82.4 -0.214185447 0.01470293 0.5777146 0.9124898 0.7483382 0.8454609 0.4190123 0.8691767 0.6828159 0.7116230 0.4795503 0.8438718 0.5101716
## 200223270003_R03C01 CN 0.9455418 0.005055871 78.6 -0.172761185 0.05745834 0.6203543 0.5375751 0.8254434 0.8653044 0.4420256 0.5160770 0.8265195 0.6854539 0.4868342 0.8548886 0.8402106
## 200223270003_R06C01 CN 0.9418716 0.029143653 80.4 -0.003667305 0.08372861 0.6236025 0.5040948 0.8181246 0.2405168 0.4355521 0.9026304 0.5209552 0.7205633 0.4927257 0.7786685 0.8472063
## cg19799454 cg00004073 cg00154902 cg02887598 cg09727210 cg11227702 cg11331837 cg16338321 cg24851651 cg25208881 cg19503462 cg03749159 cg03088219 cg26081710 cg09120722 cg11787167
## 200223270003_R02C01 0.9178930 0.02928535 0.5137741 0.04020908 0.4240111 0.86486075 0.03692842 0.5350242 0.03674702 0.1851956 0.7951675 0.9355921 0.844002862 0.8751040 0.5878977 0.03853894
## 200223270003_R03C01 0.9106247 0.02787198 0.8540746 0.67073881 0.8812928 0.49184121 0.57150125 0.8294062 0.05358297 0.9092286 0.4537684 0.9153921 0.007435243 0.9198212 0.8287506 0.04673831
## 200223270003_R06C01 0.9066551 0.64576463 0.8188126 0.73408417 0.8493743 0.02543724 0.03182862 0.4918708 0.05968923 0.9265502 0.6997359 0.9255807 0.120155222 0.8801892 0.8793344 0.32564508
## cg12543766 cg19471911 cg11540596 cg01921484 cg00415024 cg12689021 cg21757617 cg01128042 cg17002719 cg16715186 cg05234269 cg12421087 cg05064044 cg15184869 cg23517115 cg00819121
## 200223270003_R02C01 0.51028134 0.6334393 0.9238951 0.9098550 0.4299553 0.7706828 0.03652647 0.9113420 0.04939181 0.2742789 0.93848584 0.5647607 0.5672851 0.8622328 0.2151144 0.9207001
## 200223270003_R03C01 0.88741539 0.8437175 0.8926595 0.9093137 0.3999122 0.7449475 0.44299089 0.5328806 0.40466475 0.7946153 0.57461229 0.5399655 0.5358875 0.8996252 0.9131440 0.9281472
## 200223270003_R06C01 0.02818501 0.6127952 0.8820252 0.9204487 0.7465084 0.7872237 0.44725379 0.5222757 0.51428089 0.8124316 0.02467208 0.5400348 0.5273964 0.8688117 0.8328364 0.9327211
## cg11019791 cg04156077 cg01910713 cg16779438 cg25169289 cg03979311 cg14710850 cg00648024 cg25712921 cg27272246 cg18816397 cg18285382 cg08096656 cg15535896 cg13573375 cg20673830
## 200223270003_R02C01 0.8112324 0.7321883 0.8573169 0.8826150 0.1100884 0.86644909 0.8048592 0.51410972 0.2829848 0.8615873 0.5472925 0.3202927 0.9362594 0.3382952 0.8670419 0.2422052
## 200223270003_R03C01 0.7831231 0.6865805 0.8538850 0.5466924 0.7667174 0.06199853 0.8090950 0.40202875 0.6220919 0.8705287 0.4940355 0.2930577 0.9314878 0.9253926 0.1733934 0.6881735
## 200223270003_R06C01 0.4353250 0.8501188 0.8110366 0.8629492 0.2264993 0.72615553 0.8285902 0.05579011 0.6384003 0.8103777 0.5337018 0.8923595 0.4943033 0.3320191 0.8888246 0.2134634
## cg26853071 cg15600437 cg16431720 cg25436480 cg27577781 cg06277607 cg08745107 cg03982462 cg25879395 cg20823859 cg06960717 cg06961873 cg10738648 cg20685672 cg09584650 cg07640670
## 200223270003_R02C01 0.4233820 0.4885353 0.7356099 0.8425160 0.8143535 0.10744587 0.02921338 0.8562777 0.88130864 0.9030711 0.7030978 0.5335591 0.44931577 0.6712101 0.08230254 0.58296513
## 200223270003_R03C01 0.7451354 0.4894487 0.8692449 0.4994032 0.8113185 0.09353494 0.78542320 0.6023731 0.02603438 0.6062985 0.7653402 0.5472606 0.49894016 0.7932091 0.09661586 0.55225610
## 200223270003_R06C01 0.4228079 0.8551374 0.8773137 0.3494312 0.8144274 0.09504696 0.02709928 0.8778458 0.91060615 0.8917348 0.7206218 0.9415177 0.05552024 0.6613646 0.52399749 0.04058533
## cg12702014 cg16858433 cg00512739 cg15098922 cg26679884 cg16536985 cg24883219 cg05876883 cg06371647 cg02823329 cg12556569 cg22666875 cg13387643 cg09216282 cg02078724 cg15700429
## 200223270003_R02C01 0.7704049 0.9184356 0.9337648 0.9286092 0.6793815 0.5789643 0.6430473 0.9039064 0.8336894 0.9462397 0.06218231 0.8177182 0.4229959 0.9349248 0.3096774 0.7879010
## 200223270003_R03C01 0.7848681 0.9194211 0.8863895 0.9027517 0.1848705 0.5418687 0.6822115 0.9223308 0.8198684 0.6464005 0.03924599 0.8291957 0.4200273 0.9244259 0.2896133 0.9114530
## 200223270003_R06C01 0.8065993 0.9271632 0.9242748 0.8525611 0.1701734 0.8392044 0.5296903 0.4697980 0.8069537 0.9633930 0.48636893 0.3694180 0.4161488 0.9263996 0.2805612 0.8838233
## cg17429539 cg08584917 cg01608425 cg08788093 cg22542451 cg00084271 cg21697769 cg05593887 cg18918831 cg08198851 cg22931151 cg18857647 cg18150287 cg00939409 cg01008088 cg17723206
## 200223270003_R02C01 0.7860900 0.5663205 0.9030410 0.03911678 0.5884356 0.8103611 0.8946108 0.5939220 0.4891660 0.6578905 0.9311023 0.8582332 0.7685695 0.2652180 0.8424817 0.92881042
## 200223270003_R03C01 0.7100923 0.9019732 0.9264388 0.60934160 0.8337068 0.7877006 0.2822953 0.5766550 0.5333801 0.6578186 0.9356702 0.8394132 0.7519166 0.8882671 0.2417656 0.48556255
## 200223270003_R06C01 0.7660838 0.9187789 0.8887753 0.88380243 0.8125084 0.7706165 0.8698740 0.9148338 0.6406575 0.1272153 0.9328614 0.2647491 0.2501173 0.8842646 0.2618620 0.01765023
## cg05321907 cg12776173 cg02932958 cg09247979 cg14170504 cg25306893 cg25758034 cg25649515 cg22305850 cg13405878 cg14687298 cg12240569 cg19301366 cg05161773 cg11133939 cg01933473
## 200223270003_R02C01 0.2880477 0.1038804 0.7901008 0.5070956 0.54915621 0.6265392 0.6114028 0.9279829 0.03361934 0.4549662 0.04206702 0.82772064 0.8831393 0.4120912 0.1282694 0.2589014
## 200223270003_R03C01 0.1782629 0.8730635 0.4210489 0.5706177 0.02236650 0.8330282 0.6649219 0.9235753 0.57522232 0.7858042 0.14813581 0.02690547 0.8072679 0.4154907 0.5920898 0.6726133
## 200223270003_R06C01 0.8427929 0.7009491 0.3825995 0.5090215 0.02988245 0.6175380 0.2393844 0.5895839 0.58548744 0.7583938 0.24260002 0.46030640 0.8796022 0.8526849 0.5127706 0.2642560
## cg26983017 cg24697433 cg18993517 cg02122327 cg11706829 cg17906851 cg17386240 cg15633912 cg16571124 cg03549208 cg02495179 cg06880438 cg10681981 cg13739190 cg09785377 cg11438323
## 200223270003_R02C01 0.89868232 0.9243095 0.2091538 0.38940091 0.8897234 0.9488392 0.7473400 0.1605530 0.9282854 0.9014487 0.6813307 0.8285145 0.7035090 0.8510103 0.9162088 0.4863471
## 200223270003_R03C01 0.03145466 0.6808390 0.2665896 0.37769608 0.5444785 0.9529718 0.7144809 0.9333421 0.9206431 0.8381784 0.7373055 0.7988881 0.7382662 0.8358482 0.9226292 0.8984559
## 200223270003_R06C01 0.84677625 0.6384606 0.2574003 0.04017909 0.5669449 0.6462151 0.8074824 0.8737362 0.9276842 0.9097817 0.5588114 0.7839538 0.6971989 0.8419471 0.6405193 0.8722772
## cg22071943 cg26846609 cg24634455 cg01280698 cg06833284 cg02668233 cg04831745 cg00322003 cg01662749 cg24307368 cg04497611 cg00146240 cg00696044 cg02627240 cg03672288 cg03737947
## 200223270003_R02C01 0.8705217 0.48860949 0.7796391 0.8985067 0.9125144 0.4708431 0.61984995 0.1759911 0.3506201 0.64323677 0.9086359 0.6336151 0.55608424 0.66706843 0.9235592 0.91824910
## 200223270003_R03C01 0.2442648 0.04878986 0.5188241 0.8846201 0.9003482 0.8841930 0.71214149 0.5702070 0.2510946 0.34980461 0.8818513 0.8957183 0.07552381 0.57129408 0.6718625 0.92067153
## 200223270003_R06C01 0.2644581 0.48026945 0.5325725 0.8847132 0.6097933 0.4575646 0.06871768 0.3077122 0.8061480 0.02720398 0.5853116 0.1433218 0.79270858 0.05309659 0.9007629 0.03638091
## cg04316537 cg06118351 cg06403901 cg06483046 cg06864789 cg07138269 cg08554146 cg08857872 cg10240127 cg11187460 cg11286989 cg11314779 cg12228670 cg13372276 cg13653328 cg14293999
## 200223270003_R02C01 0.8074830 0.3633940 0.92790690 0.04383925 0.05369415 0.5002290 0.8982080 0.3395280 0.9250553 0.03672179 0.7590008 0.0242134 0.8632174 0.04888111 0.9245434 0.2836710
## 200223270003_R03C01 0.8453340 0.4714860 0.04783341 0.50720277 0.46053125 0.9426707 0.8963074 0.8181845 0.9403255 0.92516409 0.8533989 0.8966100 0.8496212 0.62396373 0.5122938 0.9172023
## 200223270003_R06C01 0.4351695 0.8655962 0.05253626 0.89604910 0.87513655 0.5057781 0.8213878 0.2970779 0.9056974 0.03109553 0.7313884 0.8908661 0.8738949 0.59693465 0.9362798 0.9168166
## cg14532717 cg14780448 cg15730644 cg15985500 cg17002338 cg17042243 cg17738613 cg18819889 cg18949721 cg21986118 cg23066280 cg23916408 cg24139837 cg25277809 cg27160885 cg05392160
## 200223270003_R02C01 0.5732280 0.9119141 0.4803181 0.8555262 0.9286251 0.2502905 0.6879612 0.9156157 0.2334245 0.6658175 0.07247841 0.1942275 0.07404605 0.1632342 0.2231606 0.9328933
## 200223270003_R03C01 0.1107638 0.6702102 0.4353906 0.8312198 0.2684163 0.2933475 0.6582258 0.9004455 0.2437792 0.6571296 0.57174588 0.9154993 0.04183445 0.4913711 0.8263885 0.2576881
## 200223270003_R06C01 0.6273416 0.6207355 0.8763048 0.8492103 0.2811103 0.2725457 0.1022257 0.9054439 0.2523095 0.7034445 0.80814756 0.8886255 0.05657120 0.5952124 0.2121179 0.8920726
## cg02631626 cg23352245 cg21139150 cg04124201 cg10666341 cg18339359 cg22169467 cg04888234 cg25059696 cg06715136 cg03600007 cg10091792 cg14192979 cg20078646 cg27224751 cg04412904
## 200223270003_R02C01 0.6280766 0.9377232 0.01853264 0.8686421 0.9046648 0.8824858 0.3095010 0.8379655 0.9017504 0.3400192 0.5658487 0.8670733 0.06336040 0.06198170 0.44503947 0.05088595
## 200223270003_R03C01 0.1951736 0.9375774 0.43223243 0.3308589 0.6731062 0.9040272 0.2978585 0.4376314 0.3047156 0.9259109 0.6018832 0.5864221 0.06019651 0.89537412 0.03214912 0.07717659
## 200223270003_R06C01 0.2699849 0.5932742 0.43772680 0.3241613 0.6443180 0.8552121 0.8955853 0.8039047 0.3051179 0.9079807 0.8611166 0.6087997 0.52114282 0.08725521 0.83123722 0.08253743
## cg17129965 cg14507637 cg14307563 cg20981163 cg22535849 cg18029737 cg14627380 cg10788927 cg08041188 cg13226272 cg11247378 cg02772171 cg04462915 cg03221390 cg22112152 cg04664583
## 200223270003_R02C01 0.8972140 0.9051258 0.1855966 0.8990628 0.8847704 0.9100454 0.9455369 0.8973154 0.7752456 0.02637249 0.1591185 0.9182018 0.03224861 0.5859063 0.8476101 0.5572814
## 200223270003_R03C01 0.8806673 0.9009460 0.8916957 0.9264076 0.8609966 0.9016634 0.9258964 0.2021398 0.3201255 0.54100016 0.7874849 0.5660559 0.50740695 0.9180706 0.8014136 0.5881190
## 200223270003_R06C01 0.8857237 0.9013686 0.8750052 0.4874651 0.8808022 0.7376586 0.5789898 0.2053075 0.7900939 0.44370701 0.4807942 0.8995479 0.02700644 0.6399867 0.7897897 0.9352717
## cg20803293 cg09451339 cg16733676 cg22741595 cg04242342 cg00295418 cg06012903 cg00345083 cg10039445 cg13368637 cg04718469 cg16089727 cg06231502 cg02550738 cg05850457 cg08896901
## 200223270003_R02C01 0.54933918 0.2243746 0.9057228 0.6525533 0.8206769 0.44954665 0.7964595 0.47960968 0.8833873 0.5597507 0.8687522 0.86748697 0.7784451 0.6201457 0.8183013 0.3581911
## 200223270003_R03C01 0.07935747 0.2340702 0.8904541 0.1730013 0.8167892 0.48471295 0.1933431 0.50833875 0.8954055 0.9100088 0.7256813 0.54996692 0.7964278 0.9011727 0.8313023 0.2467071
## 200223270003_R06C01 0.42191244 0.8921284 0.1698111 0.1550739 0.8040357 0.02004532 0.1960773 0.03929249 0.8832807 0.8739205 0.8521881 0.05876736 0.7706160 0.9085849 0.8161364 0.9225209
## cg17268094 cg01549082 cg12146221 cg06394820 cg26901661 cg12784167 cg13815695 cg01462799 cg00322820 cg02356645
## 200223270003_R02C01 0.5774753 0.2924138 0.2049284 0.8513195 0.8951971 0.81503498 0.9267057 0.8284427 0.4869764 0.5105903
## 200223270003_R03C01 0.9003262 0.7065693 0.1814927 0.8695521 0.8754981 0.02811410 0.6859729 0.4038824 0.4858988 0.5833923
## 200223270003_R06C01 0.8789368 0.2895440 0.8619250 0.4415020 0.9021064 0.03073269 0.6509046 0.4676821 0.4754313 0.5701428
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 251
dim(testData)
## [1] 194 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Median_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Median_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 120 17
## CN 8 49
##
## Accuracy : 0.8711
## 95% CI : (0.8157, 0.9148)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.656e-11
##
## Kappa : 0.7031
##
## Mcnemar's Test P-Value : 0.1096
##
## Sensitivity : 0.9375
## Specificity : 0.7424
## Pos Pred Value : 0.8759
## Neg Pred Value : 0.8596
## Prevalence : 0.6598
## Detection Rate : 0.6186
## Detection Prevalence : 0.7062
## Balanced Accuracy : 0.8400
##
## 'Positive' Class : CI
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Median_LRM1_Accuracy <- cm_FeatEval_Median_LRM1$overall["Accuracy"]
cm_FeatEval_Median_LRM1_Kappa <- cm_FeatEval_Median_LRM1$overall["Kappa"]
print(cm_FeatEval_Median_LRM1_Accuracy)
## Accuracy
## 0.871134
print(cm_FeatEval_Median_LRM1_Kappa)
## Kappa
## 0.703146
print(model_LRM1)
## glmnet
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001769938 0.7994628 0.5470726
## 0.10 0.0017699384 0.8060806 0.5561292
## 0.10 0.0176993845 0.8193162 0.5843843
## 0.55 0.0001769938 0.7884982 0.5192244
## 0.55 0.0017699384 0.7818803 0.5017078
## 0.55 0.0176993845 0.7289621 0.3714212
## 1.00 0.0001769938 0.7686935 0.4709982
## 1.00 0.0017699384 0.7488889 0.4371236
## 1.00 0.0176993845 0.6717460 0.2242205
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.01769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Median_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997797356828194"
print(FeatEval_Median_LRM1_trainAccuracy)
## [1] 0.9977974
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7681699
FeatEval_Median_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Median_mean_accuracy_cv_LRM1)
## [1] 0.7681699
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9092
## [1] "The auc value is:"
## Area under the curve: 0.9092
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_LRM1_AUC <-mean_auc
}
print(FeatEval_Median_LRM1_AUC)
## Area under the curve: 0.9092
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC3 100.00
## PC1 75.12
## PC2 58.87
## cg23432430 57.15
## cg09727210 54.10
## cg07158503 46.44
## cg00962106 45.17
## cg06697310 42.80
## cg10701746 39.69
## cg09015880 38.90
## cg16338321 38.30
## cg00819121 38.05
## cg02225060 37.05
## cg00415024 36.20
## cg26081710 36.12
## cg21757617 35.91
## cg14168080 35.75
## cg05064044 34.73
## cg02887598 33.32
## cg00004073 32.12
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 3.18842587
## 2 2.39520051
## 3 1.87706979
## 4 1.82208262
## 5 1.72486003
## 6 1.48063346
## 7 1.44023001
## 8 1.36480429
## 9 1.26563567
## 10 1.24035703
## 11 1.22110887
## 12 1.21305309
## 13 1.18147057
## 14 1.15430972
## 15 1.15152197
## 16 1.14497407
## 17 1.13975248
## 18 1.10722616
## 19 1.06235813
## 20 1.02420300
## 21 1.02185532
## 22 1.01652980
## 23 1.01169312
## 24 0.99959858
## 25 0.99167486
## 26 0.97320357
## 27 0.96863996
## 28 0.96137801
## 29 0.95340981
## 30 0.94516230
## 31 0.94444136
## 32 0.94158925
## 33 0.93561012
## 34 0.93082797
## 35 0.91401062
## 36 0.90745296
## 37 0.90243919
## 38 0.90130081
## 39 0.90117310
## 40 0.89944471
## 41 0.88338684
## 42 0.87896114
## 43 0.87442980
## 44 0.87404942
## 45 0.86190645
## 46 0.85956601
## 47 0.85931524
## 48 0.84726874
## 49 0.84712594
## 50 0.80542049
## 51 0.77666168
## 52 0.77658413
## 53 0.77550563
## 54 0.76983886
## 55 0.76826044
## 56 0.76648581
## 57 0.75910543
## 58 0.75033770
## 59 0.75015499
## 60 0.74765138
## 61 0.74701445
## 62 0.74459931
## 63 0.74292730
## 64 0.73194816
## 65 0.73154206
## 66 0.72943791
## 67 0.72752233
## 68 0.72545029
## 69 0.72530308
## 70 0.71913220
## 71 0.71725145
## 72 0.71325216
## 73 0.71183026
## 74 0.70764556
## 75 0.70437740
## 76 0.70180753
## 77 0.69872001
## 78 0.69818706
## 79 0.68824150
## 80 0.67720915
## 81 0.67693328
## 82 0.67023307
## 83 0.66811642
## 84 0.65461716
## 85 0.64734474
## 86 0.64680347
## 87 0.64387681
## 88 0.63333800
## 89 0.63256915
## 90 0.62861808
## 91 0.62825803
## 92 0.62459285
## 93 0.62387701
## 94 0.62262167
## 95 0.61525029
## 96 0.61173712
## 97 0.60855070
## 98 0.60805850
## 99 0.60228586
## 100 0.59991420
## 101 0.59799939
## 102 0.59625593
## 103 0.58635772
## 104 0.58299740
## 105 0.58222172
## 106 0.57878501
## 107 0.57732288
## 108 0.57357653
## 109 0.56954900
## 110 0.56790810
## 111 0.56096287
## 112 0.56055544
## 113 0.56015427
## 114 0.55346945
## 115 0.55291544
## 116 0.54912038
## 117 0.54188065
## 118 0.53934854
## 119 0.53721784
## 120 0.53164099
## 121 0.52988696
## 122 0.52278157
## 123 0.51772990
## 124 0.51234945
## 125 0.50135775
## 126 0.50048085
## 127 0.49493622
## 128 0.47392268
## 129 0.46924195
## 130 0.46811371
## 131 0.46345297
## 132 0.46248719
## 133 0.46198127
## 134 0.46121369
## 135 0.46044321
## 136 0.45933073
## 137 0.45287300
## 138 0.44969202
## 139 0.44910238
## 140 0.44539579
## 141 0.44441219
## 142 0.43846893
## 143 0.43281983
## 144 0.42764281
## 145 0.42090033
## 146 0.41791079
## 147 0.41610015
## 148 0.40447137
## 149 0.39487293
## 150 0.39477696
## 151 0.39285964
## 152 0.38840457
## 153 0.38704367
## 154 0.37749203
## 155 0.37684829
## 156 0.37550783
## 157 0.37283355
## 158 0.36932751
## 159 0.36926725
## 160 0.36435426
## 161 0.36237676
## 162 0.35967472
## 163 0.35706949
## 164 0.35243710
## 165 0.35081239
## 166 0.34597818
## 167 0.34363845
## 168 0.33604979
## 169 0.33421572
## 170 0.33225173
## 171 0.32909256
## 172 0.32887522
## 173 0.32686496
## 174 0.32677868
## 175 0.32560098
## 176 0.32312631
## 177 0.32177734
## 178 0.32046099
## 179 0.32035202
## 180 0.31386523
## 181 0.31116196
## 182 0.30746456
## 183 0.30434478
## 184 0.29286777
## 185 0.29003664
## 186 0.28938759
## 187 0.28550974
## 188 0.28451909
## 189 0.28349415
## 190 0.28348491
## 191 0.28205446
## 192 0.27800578
## 193 0.27236595
## 194 0.27086073
## 195 0.27047475
## 196 0.26995359
## 197 0.26797697
## 198 0.26198655
## 199 0.26148619
## 200 0.25997417
## 201 0.25974403
## 202 0.25343703
## 203 0.24248192
## 204 0.24091350
## 205 0.22950223
## 206 0.21893151
## 207 0.21500503
## 208 0.20840975
## 209 0.20814535
## 210 0.20792596
## 211 0.20398664
## 212 0.20220817
## 213 0.19352816
## 214 0.19162572
## 215 0.18799061
## 216 0.18733175
## 217 0.18423807
## 218 0.17916954
## 219 0.17718443
## 220 0.17569282
## 221 0.16183559
## 222 0.15690863
## 223 0.15021387
## 224 0.14174255
## 225 0.12645035
## 226 0.11784052
## 227 0.11465985
## 228 0.10090127
## 229 0.09003150
## 230 0.08787500
## 231 0.08696989
## 232 0.08581724
## 233 0.08463400
## 234 0.08178926
## 235 0.07610370
## 236 0.07492226
## 237 0.05059524
## 238 0.02964717
## 239 0.01765970
## 240 0.01508225
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CI CN
## 427 221
prop.table(table(df_LRM1$DX))
##
## CI CN
## 0.6589506 0.3410494
table(trainData$DX)
##
## CI CN
## 299 155
prop.table(table(trainData$DX))
##
## CI CN
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.932127
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.929032Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 65.488, df = 1, p-value = 5.848e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 45.674, df = 1, p-value = 1.397e-11library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CI CN
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 117 17
## CN 11 49
##
## Accuracy : 0.8557
## 95% CI : (0.7982, 0.9019)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 5.787e-10
##
## Kappa : 0.6713
##
## Mcnemar's Test P-Value : 0.3447
##
## Sensitivity : 0.9141
## Specificity : 0.7424
## Pos Pred Value : 0.8731
## Neg Pred Value : 0.8167
## Prevalence : 0.6598
## Detection Rate : 0.6031
## Detection Prevalence : 0.6907
## Balanced Accuracy : 0.8282
##
## 'Positive' Class : CI
##
print(model_LRM2)
## glmnet
##
## 609 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 487, 488, 487, 487, 487
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.000194718 0.8998510 0.7993574
## 0.10 0.001947180 0.8998510 0.7993487
## 0.10 0.019471799 0.8916272 0.7829100
## 0.55 0.000194718 0.8817775 0.7630141
## 0.55 0.001947180 0.8719279 0.7432252
## 0.55 0.019471799 0.8424333 0.6840738
## 1.00 0.000194718 0.8637041 0.7266518
## 1.00 0.001947180 0.8637312 0.7268088
## 1.00 0.019471799 0.7865872 0.5721279
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.00194718.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8668323
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC3 100.00
## PC1 56.77
## PC2 38.47
## cg09727210 37.36
## cg23432430 34.79
## cg10701746 32.31
## cg07158503 31.85
## cg06697310 30.72
## cg09015880 29.49
## cg16338321 29.27
## cg00962106 28.49
## cg00819121 27.36
## cg26081710 27.20
## cg05064044 27.11
## cg00154902 26.79
## cg14168080 25.73
## cg01910713 24.94
## cg02225060 24.61
## cg21757617 24.46
## cg00415024 24.33
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4|| METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 9.39409905
## 2 5.33332095
## 3 3.61401060
## 4 3.50916676
## 5 3.26781163
## 6 3.03499550
## 7 2.99195070
## 8 2.88616468
## 9 2.76989824
## 10 2.74934125
## 11 2.67647712
## 12 2.56998941
## 13 2.55511336
## 14 2.54633955
## 15 2.51702295
## 16 2.41746666
## 17 2.34332478
## 18 2.31152433
## 19 2.29792871
## 20 2.28593634
## 21 2.25446940
## 22 2.24139023
## 23 2.23077992
## 24 2.20179481
## 25 2.20122355
## 26 2.12397515
## 27 2.11859322
## 28 2.11590640
## 29 2.10828412
## 30 2.10425385
## 31 2.00416932
## 32 1.99119244
## 33 1.97013624
## 34 1.94250141
## 35 1.93997768
## 36 1.93910212
## 37 1.92804417
## 38 1.92676735
## 39 1.91039200
## 40 1.89653350
## 41 1.89352775
## 42 1.88870826
## 43 1.85280986
## 44 1.84711558
## 45 1.84181509
## 46 1.82873528
## 47 1.81996631
## 48 1.78534603
## 49 1.75869637
## 50 1.72544261
## 51 1.71635712
## 52 1.69898528
## 53 1.69687931
## 54 1.69632149
## 55 1.63023756
## 56 1.60846815
## 57 1.59731145
## 58 1.58905291
## 59 1.57745226
## 60 1.57203125
## 61 1.55225017
## 62 1.55039935
## 63 1.50865818
## 64 1.50241377
## 65 1.48281286
## 66 1.47930125
## 67 1.47168246
## 68 1.46518546
## 69 1.45948715
## 70 1.45897625
## 71 1.45442584
## 72 1.45407740
## 73 1.45328407
## 74 1.44483896
## 75 1.44292232
## 76 1.42716083
## 77 1.41217219
## 78 1.40908352
## 79 1.40859045
## 80 1.40819006
## 81 1.38214012
## 82 1.38121012
## 83 1.37824336
## 84 1.37760826
## 85 1.37084457
## 86 1.36974365
## 87 1.34747956
## 88 1.34305584
## 89 1.34136305
## 90 1.33965246
## 91 1.33805998
## 92 1.33434058
## 93 1.33384339
## 94 1.33351835
## 95 1.32909066
## 96 1.32083515
## 97 1.30472650
## 98 1.29861097
## 99 1.28820652
## 100 1.28776003
## 101 1.28386112
## 102 1.28300726
## 103 1.28293163
## 104 1.28071810
## 105 1.27353972
## 106 1.23767420
## 107 1.23266459
## 108 1.21644949
## 109 1.21127960
## 110 1.20304181
## 111 1.20242087
## 112 1.19878534
## 113 1.19568771
## 114 1.18972607
## 115 1.16210774
## 116 1.15323473
## 117 1.15144637
## 118 1.13511651
## 119 1.13206409
## 120 1.11512283
## 121 1.10944738
## 122 1.10594906
## 123 1.09388850
## 124 1.08624655
## 125 1.08386893
## 126 1.08220828
## 127 1.06987759
## 128 1.06012236
## 129 1.03575091
## 130 1.01523652
## 131 0.95786320
## 132 0.94104576
## 133 0.94021910
## 134 0.93736289
## 135 0.92760836
## 136 0.92421297
## 137 0.91909747
## 138 0.91698060
## 139 0.91627299
## 140 0.88416158
## 141 0.88380868
## 142 0.87741894
## 143 0.87741495
## 144 0.87638123
## 145 0.84184534
## 146 0.83266341
## 147 0.82835894
## 148 0.82326511
## 149 0.78040929
## 150 0.77993897
## 151 0.77953213
## 152 0.76325238
## 153 0.75088132
## 154 0.73277434
## 155 0.73001267
## 156 0.72760278
## 157 0.72702111
## 158 0.72582586
## 159 0.72244894
## 160 0.71471180
## 161 0.71306226
## 162 0.70819990
## 163 0.70778707
## 164 0.69185295
## 165 0.68534399
## 166 0.67846502
## 167 0.66677025
## 168 0.66062848
## 169 0.65817693
## 170 0.65070486
## 171 0.63422384
## 172 0.62830057
## 173 0.62641976
## 174 0.62616021
## 175 0.62369562
## 176 0.62235936
## 177 0.61739965
## 178 0.60983380
## 179 0.60399371
## 180 0.60183344
## 181 0.60003127
## 182 0.58702488
## 183 0.55998096
## 184 0.54551365
## 185 0.54185617
## 186 0.53906219
## 187 0.53230705
## 188 0.52137529
## 189 0.51280273
## 190 0.51244874
## 191 0.51125652
## 192 0.50642823
## 193 0.49797586
## 194 0.49322074
## 195 0.49014646
## 196 0.46934884
## 197 0.46602585
## 198 0.46523199
## 199 0.45880197
## 200 0.45428680
## 201 0.44422728
## 202 0.42713223
## 203 0.41277787
## 204 0.40876440
## 205 0.40725044
## 206 0.40124370
## 207 0.38229474
## 208 0.37402445
## 209 0.36801628
## 210 0.36627131
## 211 0.36185407
## 212 0.33898771
## 213 0.29501266
## 214 0.28737997
## 215 0.25093751
## 216 0.24307357
## 217 0.23728746
## 218 0.21863158
## 219 0.21121457
## 220 0.18898753
## 221 0.18371601
## 222 0.17637921
## 223 0.17483811
## 224 0.17348756
## 225 0.15479794
## 226 0.15321118
## 227 0.15215028
## 228 0.14061332
## 229 0.12752344
## 230 0.11773079
## 231 0.10923983
## 232 0.10693030
## 233 0.08250856
## 234 0.08076162
## 235 0.07346061
## 236 0.06564513
## 237 0.03601894
## 238 0.03532381
## 239 0.02909394
## 240 0.00691040
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9042
## [1] "The auc value is:"
## Area under the curve: 0.9042
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8369231 0.62861651
## 0 0.05357895 0.8545543 0.66586275
## 0 0.10615789 0.8457387 0.63995848
## 0 0.15873684 0.8434921 0.63267580
## 0 0.21131579 0.8412454 0.62101906
## 0 0.26389474 0.8412698 0.61615330
## 0 0.31647368 0.8324542 0.59121103
## 0 0.36905263 0.8192674 0.55342696
## 0 0.42163158 0.8192674 0.55342696
## 0 0.47421053 0.8060562 0.51338362
## 0 0.52678947 0.8038584 0.50491822
## 0 0.57936842 0.7994628 0.49039759
## 0 0.63194737 0.7950672 0.47776549
## 0 0.68452632 0.7906716 0.46490837
## 0 0.73710526 0.7840537 0.44525210
## 0 0.78968421 0.7818315 0.43824492
## 0 0.84226316 0.7796337 0.42781064
## 0 0.89484211 0.7642247 0.38014916
## 0 0.94742105 0.7598046 0.36607884
## 0 1.00000000 0.7554335 0.34703455
## 1 0.00100000 0.7554823 0.44828417
## 1 0.05357895 0.6564103 0.01945757
## 1 0.10615789 0.6585836 0.00000000
## 1 0.15873684 0.6585836 0.00000000
## 1 0.21131579 0.6585836 0.00000000
## 1 0.26389474 0.6585836 0.00000000
## 1 0.31647368 0.6585836 0.00000000
## 1 0.36905263 0.6585836 0.00000000
## 1 0.42163158 0.6585836 0.00000000
## 1 0.47421053 0.6585836 0.00000000
## 1 0.52678947 0.6585836 0.00000000
## 1 0.57936842 0.6585836 0.00000000
## 1 0.63194737 0.6585836 0.00000000
## 1 0.68452632 0.6585836 0.00000000
## 1 0.73710526 0.6585836 0.00000000
## 1 0.78968421 0.6585836 0.00000000
## 1 0.84226316 0.6585836 0.00000000
## 1 0.89484211 0.6585836 0.00000000
## 1 0.94742105 0.6585836 0.00000000
## 1 1.00000000 0.6585836 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.05357895.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7355177
FeatEval_Median_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Median_mean_accuracy_cv_ENM1)
## [1] 0.7355177
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Median_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997797356828194"
print(FeatEval_Median_ENM1_trainAccuracy)
## [1] 0.9977974
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Median_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Median_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 123 16
## CN 5 50
##
## Accuracy : 0.8918
## 95% CI : (0.8393, 0.9317)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 7.689e-14
##
## Kappa : 0.7487
##
## Mcnemar's Test P-Value : 0.0291
##
## Sensitivity : 0.9609
## Specificity : 0.7576
## Pos Pred Value : 0.8849
## Neg Pred Value : 0.9091
## Prevalence : 0.6598
## Detection Rate : 0.6340
## Detection Prevalence : 0.7165
## Balanced Accuracy : 0.8593
##
## 'Positive' Class : CI
##
cm_FeatEval_Median_ENM1_Accuracy<-cm_FeatEval_Median_ENM1$overall["Accuracy"]
cm_FeatEval_Median_ENM1_Kappa<-cm_FeatEval_Median_ENM1$overall["Kappa"]
print(cm_FeatEval_Median_ENM1_Accuracy)
## Accuracy
## 0.8917526
print(cm_FeatEval_Median_ENM1_Kappa)
## Kappa
## 0.7487357
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC3 100.00
## PC2 76.08
## PC1 70.24
## cg23432430 55.07
## cg09727210 47.87
## cg07158503 45.65
## cg00962106 44.80
## cg06697310 40.39
## cg02225060 37.48
## cg16338321 37.33
## cg26081710 36.09
## cg00819121 36.07
## cg00415024 35.83
## cg09015880 35.12
## cg05064044 35.03
## cg10701746 34.38
## cg21757617 33.77
## cg00004073 32.82
## cg07504457 32.74
## cg06277607 32.40
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 2.237153316
## 2 1.704040405
## 3 1.573965466
## 4 1.235986222
## 5 1.075567910
## 6 1.025970043
## 7 1.007096020
## 8 0.908827880
## 9 0.843943470
## 10 0.840597048
## 11 0.812985211
## 12 0.812481924
## 13 0.807145398
## 14 0.791400263
## 15 0.789313242
## 16 0.774890174
## 17 0.761323163
## 18 0.740218154
## 19 0.738421596
## 20 0.730705360
## 21 0.729944163
## 22 0.726527624
## 23 0.724879801
## 24 0.714758473
## 25 0.706189132
## 26 0.695989206
## 27 0.691363721
## 28 0.690224818
## 29 0.688758365
## 30 0.687144760
## 31 0.687003742
## 32 0.686685451
## 33 0.673996831
## 34 0.665194458
## 35 0.661584084
## 36 0.656989250
## 37 0.651083011
## 38 0.647397986
## 39 0.643699804
## 40 0.632228957
## 41 0.631206405
## 42 0.629710163
## 43 0.623651021
## 44 0.614335337
## 45 0.611002823
## 46 0.604776522
## 47 0.603145997
## 48 0.601261672
## 49 0.598003845
## 50 0.594839526
## 51 0.589913525
## 52 0.588315692
## 53 0.587640055
## 54 0.586835887
## 55 0.585731266
## 56 0.585043673
## 57 0.584145480
## 58 0.581789994
## 59 0.581035210
## 60 0.578956245
## 61 0.567481436
## 62 0.565939326
## 63 0.561721903
## 64 0.560888478
## 65 0.557207726
## 66 0.551410875
## 67 0.548207238
## 68 0.546593627
## 69 0.544378004
## 70 0.541465977
## 71 0.540479426
## 72 0.539738213
## 73 0.538037464
## 74 0.537128971
## 75 0.535156116
## 76 0.532385153
## 77 0.525161009
## 78 0.525029670
## 79 0.524070961
## 80 0.514925239
## 81 0.508845017
## 82 0.507032774
## 83 0.505712205
## 84 0.504360045
## 85 0.501845525
## 86 0.500001082
## 87 0.493794160
## 88 0.492896575
## 89 0.492167725
## 90 0.491383092
## 91 0.488709588
## 92 0.487820601
## 93 0.487623880
## 94 0.485944046
## 95 0.484692117
## 96 0.481022524
## 97 0.478674193
## 98 0.476638858
## 99 0.476231503
## 100 0.468057321
## 101 0.467062660
## 102 0.467052683
## 103 0.465093828
## 104 0.463765669
## 105 0.457111076
## 106 0.456553318
## 107 0.455677586
## 108 0.454856421
## 109 0.454502834
## 110 0.454283502
## 111 0.450787228
## 112 0.443307620
## 113 0.441211657
## 114 0.439186488
## 115 0.436290735
## 116 0.436283372
## 117 0.435181128
## 118 0.432424025
## 119 0.429630520
## 120 0.427576252
## 121 0.422019555
## 122 0.414341777
## 123 0.410262786
## 124 0.409469237
## 125 0.409181517
## 126 0.409005727
## 127 0.408635271
## 128 0.407303282
## 129 0.403279581
## 130 0.398823947
## 131 0.398394952
## 132 0.397100864
## 133 0.392021640
## 134 0.389207444
## 135 0.389153165
## 136 0.387841581
## 137 0.381068904
## 138 0.380865637
## 139 0.376759817
## 140 0.376073665
## 141 0.376029402
## 142 0.375020152
## 143 0.374455168
## 144 0.374042625
## 145 0.370596848
## 146 0.360735096
## 147 0.359465238
## 148 0.358067574
## 149 0.347099725
## 150 0.345987364
## 151 0.344197399
## 152 0.343836461
## 153 0.335781742
## 154 0.334182931
## 155 0.333390098
## 156 0.329732553
## 157 0.329093434
## 158 0.328914541
## 159 0.324752633
## 160 0.324014688
## 161 0.323309477
## 162 0.322211875
## 163 0.321738963
## 164 0.320470675
## 165 0.319463888
## 166 0.318001860
## 167 0.317813828
## 168 0.317309856
## 169 0.315911278
## 170 0.310908449
## 171 0.310296236
## 172 0.307098995
## 173 0.306865321
## 174 0.305040266
## 175 0.303799404
## 176 0.303402188
## 177 0.302901454
## 178 0.302786919
## 179 0.301771418
## 180 0.300685211
## 181 0.300297618
## 182 0.297942871
## 183 0.295600444
## 184 0.291987869
## 185 0.291820929
## 186 0.287155893
## 187 0.286586631
## 188 0.286028321
## 189 0.285948624
## 190 0.284394197
## 191 0.283220362
## 192 0.273915443
## 193 0.272224306
## 194 0.270708050
## 195 0.264361742
## 196 0.261223254
## 197 0.258589999
## 198 0.257450469
## 199 0.256924700
## 200 0.253267758
## 201 0.253099348
## 202 0.252900800
## 203 0.252665230
## 204 0.252021923
## 205 0.251622787
## 206 0.251170397
## 207 0.250901868
## 208 0.246752605
## 209 0.242207738
## 210 0.240875969
## 211 0.238734241
## 212 0.238147174
## 213 0.236636925
## 214 0.234824424
## 215 0.233831159
## 216 0.233796076
## 217 0.231916137
## 218 0.231873255
## 219 0.230148809
## 220 0.226269264
## 221 0.224988816
## 222 0.217717950
## 223 0.207813501
## 224 0.200880311
## 225 0.199912806
## 226 0.199330762
## 227 0.199263674
## 228 0.192905184
## 229 0.189981299
## 230 0.185566344
## 231 0.180842016
## 232 0.180612879
## 233 0.177355598
## 234 0.177303096
## 235 0.172482660
## 236 0.165725965
## 237 0.161592088
## 238 0.157547264
## 239 0.154069286
## 240 0.145073143
## 241 0.139702732
## 242 0.132940855
## 243 0.107622914
## 244 0.093195960
## 245 0.083808507
## 246 0.068226558
## 247 0.038284346
## 248 0.011174598
## 249 0.011093351
## 250 0.008817437
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.9244
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_ENM1_AUC <- mean_auc
}
print(FeatEval_Median_ENM1_AUC)
## Area under the curve: 0.9244
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6057387 0.041449103
## 0.3 1 0.6 0.50 100 0.6475214 0.161924138
## 0.3 1 0.6 0.50 150 0.6585836 0.185782714
## 0.3 1 0.6 0.75 50 0.5990476 -0.002216854
## 0.3 1 0.6 0.75 100 0.6364591 0.099264033
## 0.3 1 0.6 0.75 150 0.6387302 0.123001640
## 0.3 1 0.6 1.00 50 0.5946764 -0.060252563
## 0.3 1 0.6 1.00 100 0.6100611 0.014960886
## 0.3 1 0.6 1.00 150 0.6122589 0.028201790
## 0.3 1 0.8 0.50 50 0.6057875 0.048356602
## 0.3 1 0.8 0.50 100 0.6146276 0.099079991
## 0.3 1 0.8 0.50 150 0.6519902 0.183355781
## 0.3 1 0.8 0.75 50 0.6122589 0.028983493
## 0.3 1 0.8 0.75 100 0.6540415 0.138948595
## 0.3 1 0.8 0.75 150 0.6452991 0.126957163
## 0.3 1 0.8 1.00 50 0.6035165 -0.026984198
## 0.3 1 0.8 1.00 100 0.6167033 0.022108852
## 0.3 1 0.8 1.00 150 0.6079365 0.017623023
## 0.3 2 0.6 0.50 50 0.6410501 0.142621607
## 0.3 2 0.6 0.50 100 0.6807570 0.237143541
## 0.3 2 0.6 0.50 150 0.6653236 0.184957303
## 0.3 2 0.6 0.75 50 0.6387057 0.120043040
## 0.3 2 0.6 0.75 100 0.6585836 0.169771926
## 0.3 2 0.6 0.75 150 0.6629792 0.179529185
## 0.3 2 0.6 1.00 50 0.6079121 -0.007222195
## 0.3 2 0.6 1.00 100 0.6277656 0.055892538
## 0.3 2 0.6 1.00 150 0.6299389 0.053985694
## 0.3 2 0.8 0.50 50 0.6586569 0.156766525
## 0.3 2 0.8 0.50 100 0.6740171 0.194537184
## 0.3 2 0.8 0.50 150 0.6871795 0.229799561
## 0.3 2 0.8 0.75 50 0.6365812 0.091005548
## 0.3 2 0.8 0.75 100 0.6695238 0.178366874
## 0.3 2 0.8 0.75 150 0.6740415 0.199679968
## 0.3 2 0.8 1.00 50 0.6387546 0.061912505
## 0.3 2 0.8 1.00 100 0.6498413 0.085035944
## 0.3 2 0.8 1.00 150 0.6431990 0.091790854
## 0.3 3 0.6 0.50 50 0.6652503 0.180896232
## 0.3 3 0.6 0.50 100 0.6762882 0.202648627
## 0.3 3 0.6 0.50 150 0.6938706 0.246462463
## 0.3 3 0.6 0.75 50 0.6430525 0.108653792
## 0.3 3 0.6 0.75 100 0.6518437 0.119521656
## 0.3 3 0.6 0.75 150 0.6651038 0.154071522
## 0.3 3 0.6 1.00 50 0.6189988 0.022458659
## 0.3 3 0.6 1.00 100 0.6277900 0.044089263
## 0.3 3 0.6 1.00 150 0.6321612 0.068121586
## 0.3 3 0.8 0.50 50 0.6870818 0.220702527
## 0.3 3 0.8 0.50 100 0.6849817 0.215554491
## 0.3 3 0.8 0.50 150 0.7026618 0.265348891
## 0.3 3 0.8 0.75 50 0.6695238 0.160209824
## 0.3 3 0.8 0.75 100 0.6784127 0.181713886
## 0.3 3 0.8 0.75 150 0.6762393 0.183923734
## 0.3 3 0.8 1.00 50 0.6322100 0.055333341
## 0.3 3 0.8 1.00 100 0.6542369 0.105705666
## 0.3 3 0.8 1.00 150 0.6564103 0.123084024
## 0.4 1 0.6 0.50 50 0.6233211 0.097160524
## 0.4 1 0.6 0.50 100 0.6321368 0.135374849
## 0.4 1 0.6 0.50 150 0.6606838 0.202568903
## 0.4 1 0.6 0.75 50 0.6298901 0.066392648
## 0.4 1 0.6 0.75 100 0.6365079 0.101960122
## 0.4 1 0.6 0.75 150 0.6629548 0.170898766
## 0.4 1 0.6 1.00 50 0.5902076 -0.032360824
## 0.4 1 0.6 1.00 100 0.6276679 0.065178853
## 0.4 1 0.6 1.00 150 0.6121856 0.050628791
## 0.4 1 0.8 0.50 50 0.6034188 0.042725924
## 0.4 1 0.8 0.50 100 0.6210989 0.106274256
## 0.4 1 0.8 0.50 150 0.6518926 0.175454680
## 0.4 1 0.8 0.75 50 0.6034432 0.031972486
## 0.4 1 0.8 0.75 100 0.6254945 0.105825417
## 0.4 1 0.8 0.75 150 0.6497436 0.168478092
## 0.4 1 0.8 1.00 50 0.6034676 -0.015357775
## 0.4 1 0.8 1.00 100 0.6144322 0.045881839
## 0.4 1 0.8 1.00 150 0.6299389 0.087664407
## 0.4 2 0.6 0.50 50 0.7027350 0.285469993
## 0.4 2 0.6 0.50 100 0.7005861 0.266987845
## 0.4 2 0.6 0.50 150 0.7049817 0.282783800
## 0.4 2 0.6 0.75 50 0.6210989 0.073751328
## 0.4 2 0.6 0.75 100 0.6563614 0.152656044
## 0.4 2 0.6 0.75 150 0.6630281 0.170078589
## 0.4 2 0.6 1.00 50 0.6498168 0.108568778
## 0.4 2 0.6 1.00 100 0.6409768 0.100664179
## 0.4 2 0.6 1.00 150 0.6431502 0.094878959
## 0.4 2 0.8 0.50 50 0.6630037 0.195785877
## 0.4 2 0.8 0.50 100 0.6652015 0.192449948
## 0.4 2 0.8 0.50 150 0.6586325 0.175156997
## 0.4 2 0.8 0.75 50 0.6431990 0.122314903
## 0.4 2 0.8 0.75 100 0.6585836 0.160388809
## 0.4 2 0.8 0.75 150 0.6740171 0.188408737
## 0.4 2 0.8 1.00 50 0.6276923 0.051743625
## 0.4 2 0.8 1.00 100 0.6210989 0.056338042
## 0.4 2 0.8 1.00 150 0.6386569 0.097140541
## 0.4 3 0.6 0.50 50 0.6629548 0.187742227
## 0.4 3 0.6 0.50 100 0.6717705 0.214340749
## 0.4 3 0.6 0.50 150 0.6739438 0.216292719
## 0.4 3 0.6 0.75 50 0.6673504 0.158599829
## 0.4 3 0.6 0.75 100 0.6850549 0.193442262
## 0.4 3 0.6 0.75 150 0.6784615 0.186845647
## 0.4 3 0.6 1.00 50 0.6476679 0.087171870
## 0.4 3 0.6 1.00 100 0.6454212 0.092689221
## 0.4 3 0.6 1.00 150 0.6498413 0.109984145
## 0.4 3 0.8 0.50 50 0.6697436 0.199983622
## 0.4 3 0.8 0.50 100 0.6807326 0.227575001
## 0.4 3 0.8 0.50 150 0.6652747 0.196161747
## 0.4 3 0.8 0.75 50 0.6365568 0.104708592
## 0.4 3 0.8 0.75 100 0.6322100 0.089533084
## 0.4 3 0.8 0.75 150 0.6410256 0.113156839
## 0.4 3 0.8 1.00 50 0.6145543 0.020490574
## 0.4 3 0.8 1.00 100 0.6299145 0.055059864
## 0.4 3 0.8 1.00 150 0.6365079 0.083696706
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 2, eta = 0.4, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6460783
FeatEval_Median_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Median_mean_accuracy_cv_xgb)
## [1] 0.6460783
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Median_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Median_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Median_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Median_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 110 46
## CN 18 20
##
## Accuracy : 0.6701
## 95% CI : (0.5991, 0.7358)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.4131203
##
## Kappa : 0.181
##
## Mcnemar's Test P-Value : 0.0007382
##
## Sensitivity : 0.8594
## Specificity : 0.3030
## Pos Pred Value : 0.7051
## Neg Pred Value : 0.5263
## Prevalence : 0.6598
## Detection Rate : 0.5670
## Detection Prevalence : 0.8041
## Balanced Accuracy : 0.5812
##
## 'Positive' Class : CI
##
cm_FeatEval_Median_xgb_Accuracy <-cm_FeatEval_Median_xgb$overall["Accuracy"]
cm_FeatEval_Median_xgb_Kappa <-cm_FeatEval_Median_xgb$overall["Kappa"]
print(cm_FeatEval_Median_xgb_Accuracy)
## Accuracy
## 0.6701031
print(cm_FeatEval_Median_xgb_Kappa)
## Kappa
## 0.1810026
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## cg19301366 100.00
## cg16431720 89.85
## cg02932958 87.78
## cg03982462 85.06
## cg25208881 84.98
## cg01008088 82.22
## cg20803293 77.23
## cg23432430 75.15
## age.now 73.22
## cg03749159 70.38
## cg07158503 68.72
## cg00004073 67.23
## cg17042243 67.12
## cg18918831 66.24
## cg22666875 63.15
## cg01128042 63.11
## cg24139837 62.80
## cg09584650 62.37
## cg24851651 61.87
## cg02887598 61.29
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: cg19301366 1.984747e-02 0.0156385077 0.013089005 1.984747e-02
## 2: cg16431720 1.783360e-02 0.0144094438 0.005235602 1.783360e-02
## 3: cg02932958 1.742185e-02 0.0110611712 0.010471204 1.742185e-02
## 4: cg03982462 1.688218e-02 0.0118235203 0.007853403 1.688218e-02
## 5: cg25208881 1.686588e-02 0.0195031566 0.010471204 1.686588e-02
## ---
## 190: cg15700429 1.442304e-04 0.0005172332 0.002617801 1.442304e-04
## 191: cg06403901 1.387303e-04 0.0004864023 0.002617801 1.387303e-04
## 192: cg24883219 1.136401e-04 0.0005453364 0.002617801 1.136401e-04
## 193: cg23916408 9.287332e-05 0.0005502390 0.002617801 9.287332e-05
## 194: cg18339359 7.924531e-05 0.0005191987 0.002617801 7.924531e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7262
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_xgb_AUC <-mean_auc
}
print(FeatEval_Median_xgb_AUC)
## Area under the curve: 0.7262
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6585836 0.00000000
## 126 0.6630037 0.03883596
## 250 0.6652015 0.04322784
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 250.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6622629
FeatEval_Median_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Median_mean_accuracy_cv_rf)
## [1] 0.6622629
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Median_rf_trainAccuracy<-train_accuracy
print(FeatEval_Median_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Median_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Median_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 126 65
## CN 2 1
##
## Accuracy : 0.6546
## 95% CI : (0.5832, 0.7213)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.5928
##
## Kappa : -6e-04
##
## Mcnemar's Test P-Value : 3.605e-14
##
## Sensitivity : 0.98438
## Specificity : 0.01515
## Pos Pred Value : 0.65969
## Neg Pred Value : 0.33333
## Prevalence : 0.65979
## Detection Rate : 0.64948
## Detection Prevalence : 0.98454
## Balanced Accuracy : 0.49976
##
## 'Positive' Class : CI
##
cm_FeatEval_Median_rf_Accuracy<-cm_FeatEval_Median_rf$overall["Accuracy"]
print(cm_FeatEval_Median_rf_Accuracy)
## Accuracy
## 0.6546392
cm_FeatEval_Median_rf_Kappa<-cm_FeatEval_Median_rf$overall["Kappa"]
print(cm_FeatEval_Median_rf_Kappa)
## Kappa
## -0.0006158584
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 250)
##
## Importance
## cg23432430 100.00
## cg01008088 92.27
## cg12689021 85.29
## cg03749159 84.28
## cg03982462 81.55
## cg21697769 80.83
## cg00415024 80.34
## cg11331837 80.10
## cg25712921 75.73
## cg14532717 74.01
## cg18816397 72.29
## cg02225060 70.83
## cg09584650 70.25
## cg11133939 70.04
## cg22741595 68.68
## cg04124201 67.72
## age.now 67.03
## cg06277607 66.78
## cg14627380 66.04
## cg19503462 65.96
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
## CI CN
## 1 3.565502830 3.565502830
## 2 3.134853250 3.134853250
## 3 2.745970277 2.745970277
## 4 2.689900677 2.689900677
## 5 2.537701437 2.537701437
## 6 2.497781007 2.497781007
## 7 2.470555726 2.470555726
## 8 2.457029673 2.457029673
## 9 2.213650261 2.213650261
## 10 2.117858787 2.117858787
## 11 2.021980158 2.021980158
## 12 1.940900761 1.940900761
## 13 1.908458625 1.908458625
## 14 1.896834030 1.896834030
## 15 1.820665430 1.820665430
## 16 1.767235051 1.767235051
## 17 1.729038283 1.729038283
## 18 1.714850883 1.714850883
## 19 1.673634257 1.673634257
## 20 1.669379953 1.669379953
## 21 1.657081268 1.657081268
## 22 1.627366098 1.627366098
## 23 1.579386737 1.579386737
## 24 1.532778914 1.532778914
## 25 1.501085628 1.501085628
## 26 1.485415098 1.485415098
## 27 1.369043229 1.369043229
## 28 1.327241845 1.327241845
## 29 1.320774873 1.320774873
## 30 1.270400952 1.270400952
## 31 1.201196999 1.201196999
## 32 1.188966284 1.188966284
## 33 1.176450859 1.176450859
## 34 1.165964185 1.165964185
## 35 1.157227016 1.157227016
## 36 1.147753406 1.147753406
## 37 1.098535364 1.098535364
## 38 1.096007497 1.096007497
## 39 1.071522386 1.071522386
## 40 1.048267646 1.048267646
## 41 1.038958046 1.038958046
## 42 1.010980043 1.010980043
## 43 0.970416727 0.970416727
## 44 0.891804279 0.891804279
## 45 0.877098864 0.877098864
## 46 0.876969403 0.876969403
## 47 0.876544570 0.876544570
## 48 0.867136389 0.867136389
## 49 0.857786621 0.857786621
## 50 0.831687813 0.831687813
## 51 0.826993278 0.826993278
## 52 0.824939717 0.824939717
## 53 0.778395058 0.778395058
## 54 0.750806999 0.750806999
## 55 0.741740759 0.741740759
## 56 0.733745005 0.733745005
## 57 0.731460979 0.731460979
## 58 0.724541713 0.724541713
## 59 0.699051112 0.699051112
## 60 0.694237512 0.694237512
## 61 0.690163868 0.690163868
## 62 0.680220669 0.680220669
## 63 0.657253952 0.657253952
## 64 0.648455126 0.648455126
## 65 0.643483860 0.643483860
## 66 0.632993341 0.632993341
## 67 0.630438255 0.630438255
## 68 0.597038053 0.597038053
## 69 0.589404743 0.589404743
## 70 0.584155928 0.584155928
## 71 0.572884477 0.572884477
## 72 0.563141399 0.563141399
## 73 0.555980718 0.555980718
## 74 0.551908135 0.551908135
## 75 0.550253864 0.550253864
## 76 0.536546406 0.536546406
## 77 0.524032359 0.524032359
## 78 0.511137748 0.511137748
## 79 0.510816441 0.510816441
## 80 0.507131009 0.507131009
## 81 0.504548755 0.504548755
## 82 0.499919055 0.499919055
## 83 0.490451802 0.490451802
## 84 0.486433745 0.486433745
## 85 0.485274027 0.485274027
## 86 0.476269728 0.476269728
## 87 0.469335209 0.469335209
## 88 0.468784583 0.468784583
## 89 0.468449671 0.468449671
## 90 0.460585518 0.460585518
## 91 0.450600333 0.450600333
## 92 0.423793465 0.423793465
## 93 0.415143164 0.415143164
## 94 0.413377477 0.413377477
## 95 0.412938238 0.412938238
## 96 0.410846061 0.410846061
## 97 0.404453887 0.404453887
## 98 0.401609387 0.401609387
## 99 0.394009601 0.394009601
## 100 0.393117234 0.393117234
## 101 0.374585984 0.374585984
## 102 0.373121943 0.373121943
## 103 0.364309072 0.364309072
## 104 0.344798048 0.344798048
## 105 0.334672137 0.334672137
## 106 0.332217941 0.332217941
## 107 0.324652907 0.324652907
## 108 0.319116320 0.319116320
## 109 0.312654559 0.312654559
## 110 0.311954820 0.311954820
## 111 0.285678630 0.285678630
## 112 0.281041336 0.281041336
## 113 0.269305208 0.269305208
## 114 0.257814624 0.257814624
## 115 0.253422556 0.253422556
## 116 0.240617075 0.240617075
## 117 0.238372852 0.238372852
## 118 0.237330058 0.237330058
## 119 0.228604751 0.228604751
## 120 0.213910585 0.213910585
## 121 0.206347831 0.206347831
## 122 0.203101740 0.203101740
## 123 0.202406756 0.202406756
## 124 0.201089595 0.201089595
## 125 0.195892957 0.195892957
## 126 0.182603353 0.182603353
## 127 0.165531773 0.165531773
## 128 0.164753781 0.164753781
## 129 0.163876139 0.163876139
## 130 0.157088512 0.157088512
## 131 0.149225794 0.149225794
## 132 0.144930720 0.144930720
## 133 0.120876624 0.120876624
## 134 0.103731010 0.103731010
## 135 0.098185429 0.098185429
## 136 0.092031187 0.092031187
## 137 0.017913420 0.017913420
## 138 -0.000727938 -0.000727938
## 139 -0.004517352 -0.004517352
## 140 -0.004537166 -0.004537166
## 141 -0.006675953 -0.006675953
## 142 -0.008218033 -0.008218033
## 143 -0.010893766 -0.010893766
## 144 -0.017017840 -0.017017840
## 145 -0.017630718 -0.017630718
## 146 -0.023863409 -0.023863409
## 147 -0.035616815 -0.035616815
## 148 -0.069598217 -0.069598217
## 149 -0.075321202 -0.075321202
## 150 -0.078622041 -0.078622041
## 151 -0.078867321 -0.078867321
## 152 -0.079430645 -0.079430645
## 153 -0.085676156 -0.085676156
## 154 -0.086524091 -0.086524091
## 155 -0.099445768 -0.099445768
## 156 -0.113918744 -0.113918744
## 157 -0.131521491 -0.131521491
## 158 -0.136800799 -0.136800799
## 159 -0.145487434 -0.145487434
## 160 -0.163941839 -0.163941839
## 161 -0.178233529 -0.178233529
## 162 -0.178421217 -0.178421217
## 163 -0.178886834 -0.178886834
## 164 -0.179760865 -0.179760865
## 165 -0.184424424 -0.184424424
## 166 -0.194964328 -0.194964328
## 167 -0.196012991 -0.196012991
## 168 -0.200061868 -0.200061868
## 169 -0.201603131 -0.201603131
## 170 -0.202553123 -0.202553123
## 171 -0.209601538 -0.209601538
## 172 -0.210393956 -0.210393956
## 173 -0.231896651 -0.231896651
## 174 -0.241122752 -0.241122752
## 175 -0.241462321 -0.241462321
## 176 -0.251521885 -0.251521885
## 177 -0.265916875 -0.265916875
## 178 -0.281172923 -0.281172923
## 179 -0.283255517 -0.283255517
## 180 -0.296365562 -0.296365562
## 181 -0.303301087 -0.303301087
## 182 -0.311789698 -0.311789698
## 183 -0.326480525 -0.326480525
## 184 -0.335690274 -0.335690274
## 185 -0.376321638 -0.376321638
## 186 -0.393309170 -0.393309170
## 187 -0.437123696 -0.437123696
## 188 -0.440050435 -0.440050435
## 189 -0.468169551 -0.468169551
## 190 -0.472949566 -0.472949566
## 191 -0.480983026 -0.480983026
## 192 -0.536197653 -0.536197653
## 193 -0.551524077 -0.551524077
## 194 -0.555599463 -0.555599463
## 195 -0.559508647 -0.559508647
## 196 -0.593923568 -0.593923568
## 197 -0.594426723 -0.594426723
## 198 -0.604395587 -0.604395587
## 199 -0.630036415 -0.630036415
## 200 -0.643846975 -0.643846975
## 201 -0.648256691 -0.648256691
## 202 -0.654437320 -0.654437320
## 203 -0.659706945 -0.659706945
## 204 -0.683015362 -0.683015362
## 205 -0.691501691 -0.691501691
## 206 -0.696497160 -0.696497160
## 207 -0.704676740 -0.704676740
## 208 -0.734421539 -0.734421539
## 209 -0.744775841 -0.744775841
## 210 -0.753947088 -0.753947088
## 211 -0.754412510 -0.754412510
## 212 -0.794471053 -0.794471053
## 213 -0.795759779 -0.795759779
## 214 -0.796814579 -0.796814579
## 215 -0.814852278 -0.814852278
## 216 -0.815820052 -0.815820052
## 217 -0.816408731 -0.816408731
## 218 -0.829524307 -0.829524307
## 219 -0.861018867 -0.861018867
## 220 -0.863363073 -0.863363073
## 221 -0.868302821 -0.868302821
## 222 -0.869580569 -0.869580569
## 223 -0.894738824 -0.894738824
## 224 -0.899513561 -0.899513561
## 225 -0.949098131 -0.949098131
## 226 -0.953311551 -0.953311551
## 227 -0.972931769 -0.972931769
## 228 -0.976729671 -0.976729671
## 229 -0.977592731 -0.977592731
## 230 -1.011107295 -1.011107295
## 231 -1.028384449 -1.028384449
## 232 -1.055701554 -1.055701554
## 233 -1.070031258 -1.070031258
## 234 -1.083249594 -1.083249594
## 235 -1.107128982 -1.107128982
## 236 -1.157202259 -1.157202259
## 237 -1.199101951 -1.199101951
## 238 -1.258769541 -1.258769541
## 239 -1.262069108 -1.262069108
## 240 -1.276258959 -1.276258959
## 241 -1.334471870 -1.334471870
## 242 -1.460723908 -1.460723908
## 243 -1.464555102 -1.464555102
## 244 -1.480346325 -1.480346325
## 245 -1.486916694 -1.486916694
## 246 -1.504021300 -1.504021300
## 247 -1.632010553 -1.632010553
## 248 -1.770856325 -1.770856325
## 249 -1.898081905 -1.898081905
## 250 -2.004853154 -2.004853154
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7143
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_rf_AUC<-mean_auc
}
print(FeatEval_Median_rf_AUC)
## Area under the curve: 0.7143
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 454 samples
## 250 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 364, 363, 363, 363
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8457875 0.6704147
## 0.50 0.8479853 0.6718115
## 1.00 0.8436142 0.6564437
##
## Tuning parameter 'sigma' was held constant at a value of 0.002046494
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002046494 and C = 0.5.
print(svm_model$bestTune)
## sigma C
## 2 0.002046494 0.5
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8457957
FeatEval_Median_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Median_mean_accuracy_cv_svm)
## [1] 0.8457957
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.984581497797357"
FeatEval_Median_svm_trainAccuracy <- train_accuracy
print(FeatEval_Median_svm_trainAccuracy)
## [1] 0.9845815
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Median_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Median_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 106 13
## CN 22 53
##
## Accuracy : 0.8196
## 95% CI : (0.7581, 0.871)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 5.878e-07
##
## Kappa : 0.611
##
## Mcnemar's Test P-Value : 0.1763
##
## Sensitivity : 0.8281
## Specificity : 0.8030
## Pos Pred Value : 0.8908
## Neg Pred Value : 0.7067
## Prevalence : 0.6598
## Detection Rate : 0.5464
## Detection Prevalence : 0.6134
## Balanced Accuracy : 0.8156
##
## 'Positive' Class : CI
##
cm_FeatEval_Median_svm_Accuracy <- cm_FeatEval_Median_svm$overall["Accuracy"]
cm_FeatEval_Median_svm_Kappa <- cm_FeatEval_Median_svm$overall["Kappa"]
print(cm_FeatEval_Median_svm_Accuracy)
## Accuracy
## 0.8195876
print(cm_FeatEval_Median_svm_Kappa)
## Kappa
## 0.6109774
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 251 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg05064044 1.0285714 1.047619 1.109524 0.06790123
## 2 cg16338321 0.9809524 1.023810 1.023810 0.06635802
## 3 cg25208881 1.0000000 1.023810 1.023810 0.06635802
## 4 cg01921484 0.9857143 1.023810 1.042857 0.06635802
## 5 cg16715186 1.0000000 1.023810 1.042857 0.06635802
## 6 cg09216282 1.0000000 1.023810 1.042857 0.06635802
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9071
## [1] "The auc vlue is:"
## Area under the curve: 0.9071
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_svm_AUC <- mean_auc
}
print(FeatEval_Median_svm_AUC )
## Area under the curve: 0.9071
Performance of the selected output features based on Frequency
processed_dataFrame<-df_process_Output_freq
processed_data<-output_Frequency_Feature
AfterProcess_FeatureName<-df_process_frequency_FeatureName
print(head(output_Frequency_Feature))
## # A tibble: 6 × 272
## DX PC1 cg23432430 cg09727210 PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080 cg02887598 cg05064044
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CI -0.214 0.948 0.424 0.0147 0.912 0.578 0.845 0.683 0.510 0.480 0.535 0.875 0.430 0.0365 0.419 0.0402 0.567
## 2 CN -0.173 0.946 0.881 0.0575 0.538 0.620 0.865 0.827 0.840 0.487 0.829 0.920 0.400 0.443 0.442 0.671 0.536
## 3 CN -0.00367 0.942 0.849 0.0837 0.504 0.624 0.241 0.521 0.847 0.493 0.492 0.880 0.747 0.447 0.436 0.734 0.527
## 4 CI -0.187 0.943 0.842 -0.0112 0.904 0.599 0.848 0.808 0.487 0.855 0.525 0.915 0.770 0.434 0.957 0.864 0.628
## 5 CI 0.0268 0.946 0.425 0.0000165 0.896 0.631 0.821 0.608 0.889 0.488 0.842 0.917 0.742 0.747 0.946 0.836 0.566
## 6 CN -0.0379 0.951 0.460 0.0157 0.886 0.615 0.784 0.764 0.906 0.842 0.842 0.923 0.761 0.774 0.399 0.412 0.0830
## # ℹ 254 more variables: cg01910713 <dbl>, cg11331837 <dbl>, cg07504457 <dbl>, cg00004073 <dbl>, cg04156077 <dbl>, cg10738648 <dbl>, cg07640670 <dbl>, cg16858433 <dbl>, cg12543766 <dbl>,
## # cg20685672 <dbl>, cg24851651 <dbl>, cg20678988 <dbl>, cg03088219 <dbl>, cg16536985 <dbl>, cg05234269 <dbl>, cg18285382 <dbl>, cg09216282 <dbl>, cg00084271 <dbl>, cg21697769 <dbl>,
## # cg15098922 <dbl>, cg27577781 <dbl>, cg18150287 <dbl>, cg08096656 <dbl>, cg19503462 <dbl>, cg07634717 <dbl>, cg26853071 <dbl>, cg09247979 <dbl>, cg00154902 <dbl>, cg15184869 <dbl>,
## # cg19471911 <dbl>, cg12702014 <dbl>, cg03979311 <dbl>, cg11787167 <dbl>, cg18857647 <dbl>, cg11540596 <dbl>, cg25712921 <dbl>, cg12240569 <dbl>, cg19301366 <dbl>, cg25436480 <dbl>,
## # cg13387643 <dbl>, cg12421087 <dbl>, cg11227702 <dbl>, cg00648024 <dbl>, cg17002719 <dbl>, cg15633912 <dbl>, cg16715186 <dbl>, cg11019791 <dbl>, cg06880438 <dbl>, cg03660162 <dbl>,
## # cg01008088 <dbl>, cg15535896 <dbl>, cg15600437 <dbl>, cg02078724 <dbl>, cg20823859 <dbl>, cg13372276 <dbl>, cg25208881 <dbl>, cg26679884 <dbl>, cg01921484 <dbl>, cg06960717 <dbl>,
## # cg25169289 <dbl>, cg08584917 <dbl>, cg22305850 <dbl>, cg11133939 <dbl>, cg01608425 <dbl>, cg06371647 <dbl>, cg03749159 <dbl>, cg24697433 <dbl>, cg21986118 <dbl>, cg18816397 <dbl>, …
print(df_process_frequency_FeatureName)
## [1] "PC1" "cg23432430" "cg09727210" "PC2" "cg00962106" "cg07158503" "cg06697310" "cg02225060" "cg09015880" "cg10701746" "cg16338321" "cg26081710" "cg00415024" "cg21757617" "cg14168080"
## [16] "cg02887598" "cg05064044" "cg01910713" "cg11331837" "cg07504457" "cg00004073" "cg04156077" "cg10738648" "cg07640670" "cg16858433" "cg12543766" "cg20685672" "cg24851651" "cg20678988" "cg03088219"
## [31] "cg16536985" "cg05234269" "cg18285382" "cg09216282" "cg00084271" "cg21697769" "cg15098922" "cg27577781" "cg18150287" "cg08096656" "cg19503462" "cg07634717" "cg26853071" "cg09247979" "cg00154902"
## [46] "cg15184869" "cg19471911" "cg12702014" "cg03979311" "cg11787167" "cg18857647" "cg11540596" "cg25712921" "cg12240569" "cg19301366" "cg25436480" "cg13387643" "cg12421087" "cg11227702" "cg00648024"
## [61] "cg17002719" "cg15633912" "cg16715186" "cg11019791" "cg06880438" "cg03660162" "cg01008088" "cg15535896" "cg15600437" "cg02078724" "cg20823859" "cg13372276" "cg25208881" "cg26679884" "cg01921484"
## [76] "cg06960717" "cg25169289" "cg08584917" "cg22305850" "cg11133939" "cg01608425" "cg06371647" "cg03749159" "cg24697433" "cg21986118" "cg18816397" "cg01128042" "cg15700429" "cg25277809" "cg22931151"
## [91] "cg24634455" "cg13405878" "cg02932958" "cg11286989" "cg05593887" "cg18918831" "cg11247378" "cg24139837" "cg17042243" "cg25879395" "cg18029737" "cg10681981" "cg26846609" "cg14293999" "cg10240127"
## [106] "cg08198851" "cg18993517" "cg02823329" "cg08745107" "cg13573375" "cg17738613" "cg02356645" "cg05876883" "cg24883219" "cg00696044" "cg17131279" "cg08041188" "cg24307368" "cg06961873" "cg05392160"
## [121] "cg26983017" "cg07138269" "cg04316537" "cg27224751" "cg04831745" "cg12556569" "cg17386240" "cg04412904" "cg00345083" "cg02668233" "cg10788927" "cg14687298" "cg14170504" "cg03672288" "cg14307563"
## [136] "cg09451339" "cg16431720" "cg01662749" "cg02495179" "cg04768387" "cg17002338" "cg01933473" "cg16089727" "cg24643105" "PC3" "cg00819121" "cg09120722" "cg27272246" "cg06277607" "cg03982462"
## [151] "cg09584650" "cg08788093" "cg22666875" "cg22542451" "cg00939409" "cg17723206" "cg05321907" "cg12776173" "cg25758034" "cg14710850" "cg23517115" "cg17429539" "cg17906851" "cg00512739" "cg12689021"
## [166] "cg16571124" "cg22071943" "cg25649515" "cg04497611" "cg15730644" "cg13739190" "cg25306893" "cg16779438" "cg06483046" "cg14780448" "cg06833284" "cg14507637" "cg18819889" "cg03549208" "cg15985500"
## [181] "cg05161773" "cg06403901" "cg22169467" "cg08857872" "cg11187460" "cg03600007" "cg05850457" "cg06715136" "cg10091792" "cg03221390" "cg02122327" "cg21139150" "cg14192979" "cg23352245" "cg00146240"
## [196] "cg20981163" "cg27160885" "cg00553601" "cg12146221" "cg13226272" "cg22112152" "cg23836570" "cg08554146" "cg09785377" "cg01462799" "cg06118351" "cg17129965" "cg18339359" "cg11438323" "cg00295418"
## [211] "cg08896901" "cg18526121" "cg02550738" "cg04664583" "cg07028768" "cg01549082" "cg13815695" "cg02627240" "cg19799454" "cg06864789" "cg03737947" "cg14532717" "cg22535849" "cg04718469" "cg14627380"
## [226] "cg10039445" "cg02631626" "cg20673830" "cg17268094" "cg11706829" "cg16733676" "cg20078646" "cg13368637" "cg16652920" "cg26901661" "cg04888234" "cg04242342" "cg00322820" "cg23066280" "cg07480955"
## [241] "cg02772171" "cg21243064" "cg21388339" "cg01153376" "cg15775217" "cg02621446" "cg10666341" "cg23177161" "cg02246922" "cg25174111" "cg00322003" "cg15586958" "cg06231502" "age.now" "cg18949721"
## [256] "cg12228670" "cg11314779" "cg23916408" "cg01280698" "cg04124201" "cg12784167" "cg04645024" "cg16202259" "cg11268585" "cg15501526" "cg03084184" "cg12333628" "cg21783012" "cg13038195" "cg04867412"
## [271] "cg20803293"
print(length(df_process_frequency_FeatureName))
## [1] 271
Num_KeyFea_Frequency <- length(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
## DX PC1 cg23432430 cg09727210 PC2 cg00962106 cg07158503 cg06697310 cg02225060 cg09015880 cg10701746 cg16338321 cg26081710 cg00415024 cg21757617 cg14168080
## 200223270003_R02C01 CI -0.214185447 0.9482702 0.4240111 0.01470293 0.9124898 0.5777146 0.8454609 0.6828159 0.5101716 0.4795503 0.5350242 0.8751040 0.4299553 0.03652647 0.4190123
## 200223270003_R03C01 CN -0.172761185 0.9455418 0.8812928 0.05745834 0.5375751 0.6203543 0.8653044 0.8265195 0.8402106 0.4868342 0.8294062 0.9198212 0.3999122 0.44299089 0.4420256
## 200223270003_R06C01 CN -0.003667305 0.9418716 0.8493743 0.08372861 0.5040948 0.6236025 0.2405168 0.5209552 0.8472063 0.4927257 0.4918708 0.8801892 0.7465084 0.44725379 0.4355521
## cg02887598 cg05064044 cg01910713 cg11331837 cg07504457 cg00004073 cg04156077 cg10738648 cg07640670 cg16858433 cg12543766 cg20685672 cg24851651 cg20678988 cg03088219 cg16536985
## 200223270003_R02C01 0.04020908 0.5672851 0.8573169 0.03692842 0.7116230 0.02928535 0.7321883 0.44931577 0.58296513 0.9184356 0.51028134 0.6712101 0.03674702 0.8438718 0.844002862 0.5789643
## 200223270003_R03C01 0.67073881 0.5358875 0.8538850 0.57150125 0.6854539 0.02787198 0.6865805 0.49894016 0.55225610 0.9194211 0.88741539 0.7932091 0.05358297 0.8548886 0.007435243 0.5418687
## 200223270003_R06C01 0.73408417 0.5273964 0.8110366 0.03182862 0.7205633 0.64576463 0.8501188 0.05552024 0.04058533 0.9271632 0.02818501 0.6613646 0.05968923 0.7786685 0.120155222 0.8392044
## cg05234269 cg18285382 cg09216282 cg00084271 cg21697769 cg15098922 cg27577781 cg18150287 cg08096656 cg19503462 cg07634717 cg26853071 cg09247979 cg00154902 cg15184869 cg19471911
## 200223270003_R02C01 0.93848584 0.3202927 0.9349248 0.8103611 0.8946108 0.9286092 0.8143535 0.7685695 0.9362594 0.7951675 0.7483382 0.4233820 0.5070956 0.5137741 0.8622328 0.6334393
## 200223270003_R03C01 0.57461229 0.2930577 0.9244259 0.7877006 0.2822953 0.9027517 0.8113185 0.7519166 0.9314878 0.4537684 0.8254434 0.7451354 0.5706177 0.8540746 0.8996252 0.8437175
## 200223270003_R06C01 0.02467208 0.8923595 0.9263996 0.7706165 0.8698740 0.8525611 0.8144274 0.2501173 0.4943033 0.6997359 0.8181246 0.4228079 0.5090215 0.8188126 0.8688117 0.6127952
## cg12702014 cg03979311 cg11787167 cg18857647 cg11540596 cg25712921 cg12240569 cg19301366 cg25436480 cg13387643 cg12421087 cg11227702 cg00648024 cg17002719 cg15633912 cg16715186
## 200223270003_R02C01 0.7704049 0.86644909 0.03853894 0.8582332 0.9238951 0.2829848 0.82772064 0.8831393 0.8425160 0.4229959 0.5647607 0.86486075 0.51410972 0.04939181 0.1605530 0.2742789
## 200223270003_R03C01 0.7848681 0.06199853 0.04673831 0.8394132 0.8926595 0.6220919 0.02690547 0.8072679 0.4994032 0.4200273 0.5399655 0.49184121 0.40202875 0.40466475 0.9333421 0.7946153
## 200223270003_R06C01 0.8065993 0.72615553 0.32564508 0.2647491 0.8820252 0.6384003 0.46030640 0.8796022 0.3494312 0.4161488 0.5400348 0.02543724 0.05579011 0.51428089 0.8737362 0.8124316
## cg11019791 cg06880438 cg03660162 cg01008088 cg15535896 cg15600437 cg02078724 cg20823859 cg13372276 cg25208881 cg26679884 cg01921484 cg06960717 cg25169289 cg08584917 cg22305850
## 200223270003_R02C01 0.8112324 0.8285145 0.8691767 0.8424817 0.3382952 0.4885353 0.3096774 0.9030711 0.04888111 0.1851956 0.6793815 0.9098550 0.7030978 0.1100884 0.5663205 0.03361934
## 200223270003_R03C01 0.7831231 0.7988881 0.5160770 0.2417656 0.9253926 0.4894487 0.2896133 0.6062985 0.62396373 0.9092286 0.1848705 0.9093137 0.7653402 0.7667174 0.9019732 0.57522232
## 200223270003_R06C01 0.4353250 0.7839538 0.9026304 0.2618620 0.3320191 0.8551374 0.2805612 0.8917348 0.59693465 0.9265502 0.1701734 0.9204487 0.7206218 0.2264993 0.9187789 0.58548744
## cg11133939 cg01608425 cg06371647 cg03749159 cg24697433 cg21986118 cg18816397 cg01128042 cg15700429 cg25277809 cg22931151 cg24634455 cg13405878 cg02932958 cg11286989 cg05593887
## 200223270003_R02C01 0.1282694 0.9030410 0.8336894 0.9355921 0.9243095 0.6658175 0.5472925 0.9113420 0.7879010 0.1632342 0.9311023 0.7796391 0.4549662 0.7901008 0.7590008 0.5939220
## 200223270003_R03C01 0.5920898 0.9264388 0.8198684 0.9153921 0.6808390 0.6571296 0.4940355 0.5328806 0.9114530 0.4913711 0.9356702 0.5188241 0.7858042 0.4210489 0.8533989 0.5766550
## 200223270003_R06C01 0.5127706 0.8887753 0.8069537 0.9255807 0.6384606 0.7034445 0.5337018 0.5222757 0.8838233 0.5952124 0.9328614 0.5325725 0.7583938 0.3825995 0.7313884 0.9148338
## cg18918831 cg11247378 cg24139837 cg17042243 cg25879395 cg18029737 cg10681981 cg26846609 cg14293999 cg10240127 cg08198851 cg18993517 cg02823329 cg08745107 cg13573375 cg17738613
## 200223270003_R02C01 0.4891660 0.1591185 0.07404605 0.2502905 0.88130864 0.9100454 0.7035090 0.48860949 0.2836710 0.9250553 0.6578905 0.2091538 0.9462397 0.02921338 0.8670419 0.6879612
## 200223270003_R03C01 0.5333801 0.7874849 0.04183445 0.2933475 0.02603438 0.9016634 0.7382662 0.04878986 0.9172023 0.9403255 0.6578186 0.2665896 0.6464005 0.78542320 0.1733934 0.6582258
## 200223270003_R06C01 0.6406575 0.4807942 0.05657120 0.2725457 0.91060615 0.7376586 0.6971989 0.48026945 0.9168166 0.9056974 0.1272153 0.2574003 0.9633930 0.02709928 0.8888246 0.1022257
## cg02356645 cg05876883 cg24883219 cg00696044 cg17131279 cg08041188 cg24307368 cg06961873 cg05392160 cg26983017 cg07138269 cg04316537 cg27224751 cg04831745 cg12556569 cg17386240
## 200223270003_R02C01 0.5105903 0.9039064 0.6430473 0.55608424 0.1900637 0.7752456 0.64323677 0.5335591 0.9328933 0.89868232 0.5002290 0.8074830 0.44503947 0.61984995 0.06218231 0.7473400
## 200223270003_R03C01 0.5833923 0.9223308 0.6822115 0.07552381 0.7048637 0.3201255 0.34980461 0.5472606 0.2576881 0.03145466 0.9426707 0.8453340 0.03214912 0.71214149 0.03924599 0.7144809
## 200223270003_R06C01 0.5701428 0.4697980 0.5296903 0.79270858 0.1492861 0.7900939 0.02720398 0.9415177 0.8920726 0.84677625 0.5057781 0.4351695 0.83123722 0.06871768 0.48636893 0.8074824
## cg04412904 cg00345083 cg02668233 cg10788927 cg14687298 cg14170504 cg03672288 cg14307563 cg09451339 cg16431720 cg01662749 cg02495179 cg04768387 cg17002338 cg01933473 cg16089727
## 200223270003_R02C01 0.05088595 0.47960968 0.4708431 0.8973154 0.04206702 0.54915621 0.9235592 0.1855966 0.2243746 0.7356099 0.3506201 0.6813307 0.3131047 0.9286251 0.2589014 0.86748697
## 200223270003_R03C01 0.07717659 0.50833875 0.8841930 0.2021398 0.14813581 0.02236650 0.6718625 0.8916957 0.2340702 0.8692449 0.2510946 0.7373055 0.9465814 0.2684163 0.6726133 0.54996692
## 200223270003_R06C01 0.08253743 0.03929249 0.4575646 0.2053075 0.24260002 0.02988245 0.9007629 0.8750052 0.8921284 0.8773137 0.8061480 0.5588114 0.9098563 0.2811103 0.2642560 0.05876736
## cg24643105 PC3 cg00819121 cg09120722 cg27272246 cg06277607 cg03982462 cg09584650 cg08788093 cg22666875 cg22542451 cg00939409 cg17723206 cg05321907 cg12776173 cg25758034
## 200223270003_R02C01 0.5303418 -0.014043316 0.9207001 0.5878977 0.8615873 0.10744587 0.8562777 0.08230254 0.03911678 0.8177182 0.5884356 0.2652180 0.92881042 0.2880477 0.1038804 0.6114028
## 200223270003_R03C01 0.5042688 0.005055871 0.9281472 0.8287506 0.8705287 0.09353494 0.6023731 0.09661586 0.60934160 0.8291957 0.8337068 0.8882671 0.48556255 0.1782629 0.8730635 0.6649219
## 200223270003_R06C01 0.9383050 0.029143653 0.9327211 0.8793344 0.8103777 0.09504696 0.8778458 0.52399749 0.88380243 0.3694180 0.8125084 0.8842646 0.01765023 0.8427929 0.7009491 0.2393844
## cg14710850 cg23517115 cg17429539 cg17906851 cg00512739 cg12689021 cg16571124 cg22071943 cg25649515 cg04497611 cg15730644 cg13739190 cg25306893 cg16779438 cg06483046 cg14780448
## 200223270003_R02C01 0.8048592 0.2151144 0.7860900 0.9488392 0.9337648 0.7706828 0.9282854 0.8705217 0.9279829 0.9086359 0.4803181 0.8510103 0.6265392 0.8826150 0.04383925 0.9119141
## 200223270003_R03C01 0.8090950 0.9131440 0.7100923 0.9529718 0.8863895 0.7449475 0.9206431 0.2442648 0.9235753 0.8818513 0.4353906 0.8358482 0.8330282 0.5466924 0.50720277 0.6702102
## 200223270003_R06C01 0.8285902 0.8328364 0.7660838 0.6462151 0.9242748 0.7872237 0.9276842 0.2644581 0.5895839 0.5853116 0.8763048 0.8419471 0.6175380 0.8629492 0.89604910 0.6207355
## cg06833284 cg14507637 cg18819889 cg03549208 cg15985500 cg05161773 cg06403901 cg22169467 cg08857872 cg11187460 cg03600007 cg05850457 cg06715136 cg10091792 cg03221390 cg02122327
## 200223270003_R02C01 0.9125144 0.9051258 0.9156157 0.9014487 0.8555262 0.4120912 0.92790690 0.3095010 0.3395280 0.03672179 0.5658487 0.8183013 0.3400192 0.8670733 0.5859063 0.38940091
## 200223270003_R03C01 0.9003482 0.9009460 0.9004455 0.8381784 0.8312198 0.4154907 0.04783341 0.2978585 0.8181845 0.92516409 0.6018832 0.8313023 0.9259109 0.5864221 0.9180706 0.37769608
## 200223270003_R06C01 0.6097933 0.9013686 0.9054439 0.9097817 0.8492103 0.8526849 0.05253626 0.8955853 0.2970779 0.03109553 0.8611166 0.8161364 0.9079807 0.6087997 0.6399867 0.04017909
## cg21139150 cg14192979 cg23352245 cg00146240 cg20981163 cg27160885 cg00553601 cg12146221 cg13226272 cg22112152 cg23836570 cg08554146 cg09785377 cg01462799 cg06118351 cg17129965
## 200223270003_R02C01 0.01853264 0.06336040 0.9377232 0.6336151 0.8990628 0.2231606 0.05601299 0.2049284 0.02637249 0.8476101 0.58688450 0.8982080 0.9162088 0.8284427 0.3633940 0.8972140
## 200223270003_R03C01 0.43223243 0.06019651 0.9375774 0.8957183 0.9264076 0.8263885 0.58957701 0.1814927 0.54100016 0.8014136 0.54259383 0.8963074 0.9226292 0.4038824 0.4714860 0.8806673
## 200223270003_R06C01 0.43772680 0.52114282 0.5932742 0.1433218 0.4874651 0.2121179 0.62426500 0.8619250 0.44370701 0.7897897 0.03267304 0.8213878 0.6405193 0.4676821 0.8655962 0.8857237
## cg18339359 cg11438323 cg00295418 cg08896901 cg18526121 cg02550738 cg04664583 cg07028768 cg01549082 cg13815695 cg02627240 cg19799454 cg06864789 cg03737947 cg14532717 cg22535849
## 200223270003_R02C01 0.8824858 0.4863471 0.44954665 0.3581911 0.4519781 0.6201457 0.5572814 0.4496851 0.2924138 0.9267057 0.66706843 0.9178930 0.05369415 0.91824910 0.5732280 0.8847704
## 200223270003_R03C01 0.9040272 0.8984559 0.48471295 0.2467071 0.4762313 0.9011727 0.5881190 0.8536078 0.7065693 0.6859729 0.57129408 0.9106247 0.46053125 0.92067153 0.1107638 0.8609966
## 200223270003_R06C01 0.8552121 0.8722772 0.02004532 0.9225209 0.4833367 0.9085849 0.9352717 0.8356936 0.2895440 0.6509046 0.05309659 0.9066551 0.87513655 0.03638091 0.6273416 0.8808022
## cg04718469 cg14627380 cg10039445 cg02631626 cg20673830 cg17268094 cg11706829 cg16733676 cg20078646 cg13368637 cg16652920 cg26901661 cg04888234 cg04242342 cg00322820 cg23066280
## 200223270003_R02C01 0.8687522 0.9455369 0.8833873 0.6280766 0.2422052 0.5774753 0.8897234 0.9057228 0.06198170 0.5597507 0.9436000 0.8951971 0.8379655 0.8206769 0.4869764 0.07247841
## 200223270003_R03C01 0.7256813 0.9258964 0.8954055 0.1951736 0.6881735 0.9003262 0.5444785 0.8904541 0.89537412 0.9100088 0.9431222 0.8754981 0.4376314 0.8167892 0.4858988 0.57174588
## 200223270003_R06C01 0.8521881 0.5789898 0.8832807 0.2699849 0.2134634 0.8789368 0.5669449 0.1698111 0.08725521 0.8739205 0.9457161 0.9021064 0.8039047 0.8040357 0.4754313 0.80814756
## cg07480955 cg02772171 cg21243064 cg21388339 cg01153376 cg15775217 cg02621446 cg10666341 cg23177161 cg02246922 cg25174111 cg00322003 cg15586958 cg06231502 age.now cg18949721
## 200223270003_R02C01 0.3874638 0.9182018 0.5191606 0.2756268 0.4872148 0.5707441 0.8731313 0.9046648 0.4151698 0.7301201 0.8526503 0.1759911 0.9058263 0.7784451 82.4 0.2334245
## 200223270003_R03C01 0.3916889 0.5660559 0.9167649 0.2102269 0.9639670 0.9168327 0.8095534 0.6731062 0.4586576 0.9447019 0.8573844 0.5702070 0.8957526 0.7964278 78.6 0.2437792
## 200223270003_R06C01 0.4043390 0.8995479 0.4862205 0.7649181 0.2242410 0.6042521 0.7511582 0.6443180 0.8287312 0.7202230 0.2567745 0.3077122 0.9121763 0.7706160 80.4 0.2523095
## cg12228670 cg11314779 cg23916408 cg01280698 cg04124201 cg12784167 cg04645024 cg16202259 cg11268585 cg15501526 cg03084184 cg12333628 cg21783012 cg13038195 cg04867412 cg20803293
## 200223270003_R02C01 0.8632174 0.0242134 0.1942275 0.8985067 0.8686421 0.81503498 0.7366541 0.9548726 0.2521544 0.6362531 0.8162981 0.9227884 0.9142369 0.45882213 0.04304823 0.54933918
## 200223270003_R03C01 0.8496212 0.8966100 0.9154993 0.8846201 0.3308589 0.02811410 0.8454827 0.3713483 0.8535791 0.6319253 0.7877128 0.9092861 0.6694884 0.02740132 0.87967997 0.07935747
## 200223270003_R06C01 0.8738949 0.8908661 0.8886255 0.8847132 0.3241613 0.03073269 0.0871902 0.4852461 0.9121931 0.7435100 0.4546397 0.5084647 0.9070112 0.46284376 0.44971146 0.42191244
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 454 272
dim(testData)
## [1] 194 272
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Freq_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Freq_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 118 17
## CN 10 49
##
## Accuracy : 0.8608
## 95% CI : (0.804, 0.9062)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.846e-10
##
## Kappa : 0.6818
##
## Mcnemar's Test P-Value : 0.2482
##
## Sensitivity : 0.9219
## Specificity : 0.7424
## Pos Pred Value : 0.8741
## Neg Pred Value : 0.8305
## Prevalence : 0.6598
## Detection Rate : 0.6082
## Detection Prevalence : 0.6959
## Balanced Accuracy : 0.8321
##
## 'Positive' Class : CI
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Freq_LRM1_Accuracy <- cm_FeatEval_Freq_LRM1$overall["Accuracy"]
cm_FeatEval_Freq_LRM1_Kappa <- cm_FeatEval_Freq_LRM1$overall["Kappa"]
print(cm_FeatEval_Freq_LRM1_Accuracy)
## Accuracy
## 0.8608247
print(cm_FeatEval_Freq_LRM1_Kappa)
## Kappa
## 0.6818127
print(model_LRM1)
## glmnet
##
## 454 samples
## 271 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001769938 0.7950916 0.5407212
## 0.10 0.0017699384 0.7906960 0.5318216
## 0.10 0.0176993845 0.7884493 0.5171950
## 0.55 0.0001769938 0.7599023 0.4610387
## 0.55 0.0017699384 0.7465690 0.4220933
## 0.55 0.0176993845 0.7025885 0.3120763
## 1.00 0.0001769938 0.7290354 0.3971826
## 1.00 0.0017699384 0.7267643 0.3850125
## 1.00 0.0176993845 0.6761172 0.2344042
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001769938.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Freq_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Freq_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.7461349
FeatEval_Freq_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Freq_mean_accuracy_cv_LRM1)
## [1] 0.7461349
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9074
## [1] "The auc value is:"
## Area under the curve: 0.9074
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_LRM1_AUC <- mean_auc
}
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 271)
##
## Overall
## PC3 100.00
## PC1 51.90
## cg09727210 37.16
## PC2 36.46
## cg23432430 32.91
## cg06697310 30.77
## cg07158503 30.61
## cg10701746 30.49
## cg09015880 28.97
## cg00962106 28.44
## cg00415024 26.14
## cg16858433 25.31
## cg14168080 25.13
## cg01910713 25.04
## cg02225060 24.45
## cg16338321 24.38
## cg26081710 23.76
## cg00819121 23.65
## cg05064044 23.20
## cg04156077 22.86
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 ||METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 15.92734220
## 2 8.26553545
## 3 5.91813200
## 4 5.80674878
## 5 5.24165459
## 6 4.90007485
## 7 4.87592489
## 8 4.85551455
## 9 4.61478136
## 10 4.53022365
## 11 4.16272340
## 12 4.03129520
## 13 4.00281089
## 14 3.98774531
## 15 3.89346533
## 16 3.88254220
## 17 3.78423706
## 18 3.76634572
## 19 3.69539997
## 20 3.64111247
## 21 3.62906951
## 22 3.50557204
## 23 3.47185706
## 24 3.41089554
## 25 3.37023298
## 26 3.33784876
## 27 3.24295118
## 28 3.18783280
## 29 3.17118905
## 30 3.15948395
## 31 3.15207396
## 32 3.14412859
## 33 3.09415071
## 34 3.06957299
## 35 2.94472526
## 36 2.89071688
## 37 2.88287554
## 38 2.86128463
## 39 2.85118996
## 40 2.84574903
## 41 2.81554859
## 42 2.80927697
## 43 2.80517185
## 44 2.75032214
## 45 2.74544241
## 46 2.72229785
## 47 2.71512892
## 48 2.70563004
## 49 2.68881594
## 50 2.67497092
## 51 2.65305149
## 52 2.65034016
## 53 2.64675913
## 54 2.64283671
## 55 2.63956272
## 56 2.63382078
## 57 2.55010382
## 58 2.54179777
## 59 2.54152160
## 60 2.53843703
## 61 2.51642841
## 62 2.46763922
## 63 2.45223586
## 64 2.44231686
## 65 2.43971963
## 66 2.43296810
## 67 2.41903787
## 68 2.34682815
## 69 2.33677971
## 70 2.31543299
## 71 2.28517697
## 72 2.28085467
## 73 2.26032445
## 74 2.25632869
## 75 2.24244679
## 76 2.22270304
## 77 2.20897369
## 78 2.20668909
## 79 2.19300211
## 80 2.15743202
## 81 2.15571784
## 82 2.13557122
## 83 2.12759925
## 84 2.12413107
## 85 2.10878172
## 86 2.09867379
## 87 2.08523013
## 88 2.07564935
## 89 2.06332618
## 90 2.05504806
## 91 2.04866093
## 92 2.04764123
## 93 2.04510807
## 94 2.03968425
## 95 2.03310649
## 96 2.02310863
## 97 2.02090364
## 98 2.01771715
## 99 2.01126801
## 100 2.00858787
## 101 1.97862630
## 102 1.95031494
## 103 1.93946310
## 104 1.92216973
## 105 1.85780256
## 106 1.85331233
## 107 1.84938178
## 108 1.84788084
## 109 1.83724884
## 110 1.83363744
## 111 1.82774671
## 112 1.80738549
## 113 1.79558711
## 114 1.78021849
## 115 1.77421627
## 116 1.73387532
## 117 1.70530495
## 118 1.69074980
## 119 1.68177368
## 120 1.68049238
## 121 1.67459279
## 122 1.65826010
## 123 1.63648018
## 124 1.60782698
## 125 1.60611836
## 126 1.60053194
## 127 1.59955019
## 128 1.56860331
## 129 1.56552464
## 130 1.55338208
## 131 1.54756177
## 132 1.53882246
## 133 1.52324462
## 134 1.51394830
## 135 1.50855095
## 136 1.48594821
## 137 1.47217799
## 138 1.44952536
## 139 1.44184955
## 140 1.43378675
## 141 1.42108908
## 142 1.41992839
## 143 1.41932249
## 144 1.39363199
## 145 1.38538965
## 146 1.37306250
## 147 1.37055390
## 148 1.36808893
## 149 1.36142615
## 150 1.35462389
## 151 1.35345413
## 152 1.34872905
## 153 1.34292994
## 154 1.31499444
## 155 1.31138144
## 156 1.28674891
## 157 1.27758196
## 158 1.27044973
## 159 1.26575020
## 160 1.26292765
## 161 1.26246812
## 162 1.25048145
## 163 1.24048572
## 164 1.23670010
## 165 1.22925034
## 166 1.21403958
## 167 1.21131559
## 168 1.20066084
## 169 1.17808268
## 170 1.16629580
## 171 1.16390465
## 172 1.08804726
## 173 1.07394317
## 174 1.04406658
## 175 1.03191534
## 176 1.03182960
## 177 1.02102657
## 178 1.01607669
## 179 0.99371527
## 180 0.98240805
## 181 0.97660766
## 182 0.97033624
## 183 0.95422433
## 184 0.94655603
## 185 0.91845543
## 186 0.90735773
## 187 0.89674758
## 188 0.89625854
## 189 0.87775208
## 190 0.86602268
## 191 0.86132515
## 192 0.84916831
## 193 0.84315319
## 194 0.84220267
## 195 0.84003417
## 196 0.82543963
## 197 0.82275814
## 198 0.78033920
## 199 0.75428505
## 200 0.75097759
## 201 0.74134410
## 202 0.73624509
## 203 0.73510359
## 204 0.72034140
## 205 0.71810061
## 206 0.71654019
## 207 0.71596246
## 208 0.71224241
## 209 0.71074903
## 210 0.70262216
## 211 0.69122376
## 212 0.65799677
## 213 0.63378638
## 214 0.63378289
## 215 0.62351543
## 216 0.61428282
## 217 0.59611272
## 218 0.59606282
## 219 0.59277437
## 220 0.58208494
## 221 0.57437958
## 222 0.57070692
## 223 0.52074019
## 224 0.50658547
## 225 0.50548040
## 226 0.49464179
## 227 0.47263809
## 228 0.45940064
## 229 0.45940029
## 230 0.45569540
## 231 0.42800884
## 232 0.36835256
## 233 0.36272264
## 234 0.35424810
## 235 0.34831125
## 236 0.31323372
## 237 0.30473683
## 238 0.29793263
## 239 0.29642103
## 240 0.29363882
## 241 0.28951455
## 242 0.27106675
## 243 0.26684098
## 244 0.26372796
## 245 0.23993772
## 246 0.22894579
## 247 0.22884951
## 248 0.19553089
## 249 0.17487794
## 250 0.15908749
## 251 0.12810849
## 252 0.12441775
## 253 0.11685277
## 254 0.09415279
## 255 0.07780898
## 256 0.07364417
## 257 0.06672353
## 258 0.06619426
## 259 0.05178736
## 260 0.04787197
## 261 0.03467063
## 262 0.02487359
## 263 0.02331246
## 264 0.01847575
## 265 0.00000000
## 266 0.00000000
## 267 0.00000000
## 268 0.00000000
## 269 0.00000000
## 270 0.00000000
## 271 0.00000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CI CN
## 427 221
prop.table(table(df_LRM1$DX))
##
## CI CN
## 0.6589506 0.3410494
table(trainData$DX)
##
## CI CN
## 299 155
prop.table(table(trainData$DX))
##
## CI CN
## 0.6585903 0.3414097
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.932127
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.929032Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 65.488, df = 1, p-value = 5.848e-16
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 45.674, df = 1, p-value = 1.397e-11library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CI CN
## 299 310
dim(balanced_data_LGR_1)
## [1] 609 272
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 119 16
## CN 9 50
##
## Accuracy : 0.8711
## 95% CI : (0.8157, 0.9148)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.656e-11
##
## Kappa : 0.7054
##
## Mcnemar's Test P-Value : 0.2301
##
## Sensitivity : 0.9297
## Specificity : 0.7576
## Pos Pred Value : 0.8815
## Neg Pred Value : 0.8475
## Prevalence : 0.6598
## Detection Rate : 0.6134
## Detection Prevalence : 0.6959
## Balanced Accuracy : 0.8436
##
## 'Positive' Class : CI
##
print(model_LRM2)
## glmnet
##
## 609 samples
## 271 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 488, 487, 487, 487, 487
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001799134 0.8767918 0.7530306
## 0.10 0.0017991337 0.8751660 0.7497261
## 0.10 0.0179913372 0.8735266 0.7464651
## 0.55 0.0001799134 0.8571196 0.7134450
## 0.55 0.0017991337 0.8571061 0.7134548
## 0.55 0.0179913372 0.8242650 0.6475373
## 1.00 0.0001799134 0.8472565 0.6936802
## 1.00 0.0017991337 0.8456171 0.6903176
## 1.00 0.0179913372 0.7980219 0.5948105
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001799134.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8505412
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 271)
##
## Overall
## PC3 100.00
## PC1 49.89
## PC2 36.15
## cg09727210 35.79
## cg23432430 30.16
## cg06697310 29.33
## cg07158503 28.54
## cg10701746 27.66
## cg00962106 27.25
## cg09015880 27.09
## cg16858433 25.65
## cg00415024 24.88
## cg01910713 24.71
## cg16338321 24.05
## cg14168080 23.89
## cg00819121 23.50
## cg02225060 23.18
## cg26081710 22.12
## cg05064044 22.03
## cg04156077 22.02
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5|| METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 16.818284630
## 2 8.390124327
## 3 6.079895098
## 4 6.019900448
## 5 5.072437562
## 6 4.933163688
## 7 4.800568592
## 8 4.652198970
## 9 4.583160745
## 10 4.556894270
## 11 4.313569222
## 12 4.184896804
## 13 4.156555526
## 14 4.045221482
## 15 4.017299156
## 16 3.952266201
## 17 3.898326470
## 18 3.719819195
## 19 3.705545286
## 20 3.702919494
## 21 3.697922855
## 22 3.547474602
## 23 3.532132827
## 24 3.459807600
## 25 3.458232690
## 26 3.283237195
## 27 3.276363051
## 28 3.262527678
## 29 3.167618871
## 30 3.161433452
## 31 3.146722148
## 32 3.093366099
## 33 3.084722440
## 34 3.074611214
## 35 3.045836125
## 36 3.010128723
## 37 2.970464786
## 38 2.910184980
## 39 2.878553776
## 40 2.878328101
## 41 2.870179191
## 42 2.829816472
## 43 2.829179141
## 44 2.822823097
## 45 2.806355508
## 46 2.779699032
## 47 2.757678143
## 48 2.742122630
## 49 2.736684216
## 50 2.714657415
## 51 2.706960014
## 52 2.633456581
## 53 2.629645682
## 54 2.617818890
## 55 2.615843512
## 56 2.576988205
## 57 2.562507470
## 58 2.521614613
## 59 2.518489090
## 60 2.502329759
## 61 2.488413141
## 62 2.463563596
## 63 2.463152369
## 64 2.453800318
## 65 2.437603457
## 66 2.427899947
## 67 2.406967403
## 68 2.401943320
## 69 2.381439895
## 70 2.352112960
## 71 2.347925166
## 72 2.323097935
## 73 2.314971690
## 74 2.302855905
## 75 2.272636364
## 76 2.244150273
## 77 2.242578563
## 78 2.220261857
## 79 2.208174967
## 80 2.207626122
## 81 2.200042058
## 82 2.192250060
## 83 2.187195643
## 84 2.185219258
## 85 2.157745295
## 86 2.143668997
## 87 2.136667541
## 88 2.133521482
## 89 2.115308420
## 90 2.094530313
## 91 2.087136467
## 92 2.085284939
## 93 2.052704827
## 94 2.040811695
## 95 2.033893599
## 96 2.031418304
## 97 2.023988338
## 98 2.003403068
## 99 2.000654041
## 100 1.996017000
## 101 1.974675425
## 102 1.962637816
## 103 1.944596934
## 104 1.939796868
## 105 1.920413644
## 106 1.902180602
## 107 1.892551477
## 108 1.892005066
## 109 1.889915061
## 110 1.855152784
## 111 1.848215389
## 112 1.834025609
## 113 1.785792676
## 114 1.778301071
## 115 1.773927258
## 116 1.756937720
## 117 1.753796218
## 118 1.749222895
## 119 1.731421647
## 120 1.721380455
## 121 1.719240261
## 122 1.715930521
## 123 1.698384903
## 124 1.682069528
## 125 1.667590663
## 126 1.643062600
## 127 1.641578766
## 128 1.635449866
## 129 1.627147053
## 130 1.622680613
## 131 1.577777185
## 132 1.576540145
## 133 1.576296223
## 134 1.571810508
## 135 1.571447475
## 136 1.524509457
## 137 1.507767488
## 138 1.507286195
## 139 1.485251213
## 140 1.474368654
## 141 1.457008395
## 142 1.447998741
## 143 1.438450761
## 144 1.434259615
## 145 1.432045302
## 146 1.429609257
## 147 1.419881991
## 148 1.418803105
## 149 1.417168704
## 150 1.401069991
## 151 1.343664591
## 152 1.340264739
## 153 1.337448187
## 154 1.330927069
## 155 1.318894292
## 156 1.289084210
## 157 1.275707957
## 158 1.254327516
## 159 1.253951736
## 160 1.241595266
## 161 1.235414852
## 162 1.213498655
## 163 1.212020845
## 164 1.210458975
## 165 1.196246267
## 166 1.186332633
## 167 1.161181928
## 168 1.143244894
## 169 1.139968390
## 170 1.125522954
## 171 1.115315401
## 172 1.084998451
## 173 1.063963864
## 174 1.034917888
## 175 1.032132637
## 176 1.015597103
## 177 1.005738210
## 178 0.999978446
## 179 0.989772919
## 180 0.989754432
## 181 0.975015067
## 182 0.974193410
## 183 0.945026434
## 184 0.941493764
## 185 0.932228635
## 186 0.930227967
## 187 0.919925395
## 188 0.916365835
## 189 0.908534570
## 190 0.900171040
## 191 0.887591914
## 192 0.876311254
## 193 0.858279311
## 194 0.848962742
## 195 0.813378972
## 196 0.806029950
## 197 0.804228287
## 198 0.783145170
## 199 0.781992194
## 200 0.776514930
## 201 0.771393503
## 202 0.745743264
## 203 0.725800447
## 204 0.720243639
## 205 0.711138549
## 206 0.687510870
## 207 0.686998704
## 208 0.679301485
## 209 0.678444574
## 210 0.657421255
## 211 0.651592532
## 212 0.646577574
## 213 0.646049748
## 214 0.635258323
## 215 0.631440204
## 216 0.622266486
## 217 0.596522235
## 218 0.589152431
## 219 0.570879288
## 220 0.558786606
## 221 0.535866250
## 222 0.525586474
## 223 0.512179472
## 224 0.505183815
## 225 0.491087596
## 226 0.484311922
## 227 0.465876744
## 228 0.465295092
## 229 0.450072540
## 230 0.420597788
## 231 0.406153018
## 232 0.392776978
## 233 0.377449167
## 234 0.352726489
## 235 0.324540249
## 236 0.319449978
## 237 0.317436989
## 238 0.310735275
## 239 0.304530249
## 240 0.291345427
## 241 0.288739787
## 242 0.269736981
## 243 0.266945228
## 244 0.266287802
## 245 0.237734142
## 246 0.207579880
## 247 0.198084446
## 248 0.175476012
## 249 0.169487562
## 250 0.159368173
## 251 0.126781963
## 252 0.116142122
## 253 0.089216160
## 254 0.084046321
## 255 0.078869851
## 256 0.071151577
## 257 0.043414415
## 258 0.032761215
## 259 0.011854184
## 260 0.004996699
## 261 0.000000000
## 262 0.000000000
## 263 0.000000000
## 264 0.000000000
## 265 0.000000000
## 266 0.000000000
## 267 0.000000000
## 268 0.000000000
## 269 0.000000000
## 270 0.000000000
## 271 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (testData$DX CN) < 128 cases (testData$DX CI).
## Area under the curve: 0.9048
## [1] "The auc value is:"
## Area under the curve: 0.9048
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 454 samples
## 271 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8060806 0.56042286
## 0 0.05357895 0.8214896 0.58612440
## 0 0.10615789 0.8193162 0.57402163
## 0 0.15873684 0.8170940 0.56534536
## 0 0.21131579 0.8192918 0.56553276
## 0 0.26389474 0.8148718 0.54883821
## 0 0.31647368 0.8126740 0.54000531
## 0 0.36905263 0.8016361 0.50914903
## 0 0.42163158 0.7950427 0.48786531
## 0 0.47421053 0.8016606 0.50030582
## 0 0.52678947 0.8038584 0.50490320
## 0 0.57936842 0.8016850 0.49697937
## 0 0.63194737 0.7862759 0.45208014
## 0 0.68452632 0.7862759 0.45018815
## 0 0.73710526 0.7796581 0.43020078
## 0 0.78968421 0.7796581 0.42825732
## 0 0.84226316 0.7708181 0.40042035
## 0 0.89484211 0.7708181 0.40042035
## 0 0.94742105 0.7642247 0.38002033
## 0 1.00000000 0.7576313 0.35950233
## 1 0.00100000 0.7333822 0.40247635
## 1 0.05357895 0.6564103 0.01945757
## 1 0.10615789 0.6585836 0.00000000
## 1 0.15873684 0.6585836 0.00000000
## 1 0.21131579 0.6585836 0.00000000
## 1 0.26389474 0.6585836 0.00000000
## 1 0.31647368 0.6585836 0.00000000
## 1 0.36905263 0.6585836 0.00000000
## 1 0.42163158 0.6585836 0.00000000
## 1 0.47421053 0.6585836 0.00000000
## 1 0.52678947 0.6585836 0.00000000
## 1 0.57936842 0.6585836 0.00000000
## 1 0.63194737 0.6585836 0.00000000
## 1 0.68452632 0.6585836 0.00000000
## 1 0.73710526 0.6585836 0.00000000
## 1 0.78968421 0.6585836 0.00000000
## 1 0.84226316 0.6585836 0.00000000
## 1 0.89484211 0.6585836 0.00000000
## 1 0.94742105 0.6585836 0.00000000
## 1 1.00000000 0.6585836 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.05357895.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.728859
FeatEval_Freq_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Freq_mean_accuracy_cv_ENM1)
## [1] 0.728859
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Freq_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997797356828194"
print(FeatEval_Freq_ENM1_trainAccuracy)
## [1] 0.9977974
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Freq_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Freq_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 124 17
## CN 4 49
##
## Accuracy : 0.8918
## 95% CI : (0.8393, 0.9317)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 7.689e-14
##
## Kappa : 0.7468
##
## Mcnemar's Test P-Value : 0.008829
##
## Sensitivity : 0.9688
## Specificity : 0.7424
## Pos Pred Value : 0.8794
## Neg Pred Value : 0.9245
## Prevalence : 0.6598
## Detection Rate : 0.6392
## Detection Prevalence : 0.7268
## Balanced Accuracy : 0.8556
##
## 'Positive' Class : CI
##
cm_FeatEval_Freq_ENM1_Accuracy<-cm_FeatEval_Freq_ENM1$overall["Accuracy"]
cm_FeatEval_Freq_ENM1_Kappa<-cm_FeatEval_Freq_ENM1$overall["Kappa"]
print(cm_FeatEval_Freq_ENM1_Accuracy)
## Accuracy
## 0.8917526
print(cm_FeatEval_Freq_ENM1_Kappa)
## Kappa
## 0.7467993
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 271)
##
## Overall
## PC3 100.00
## PC2 99.27
## PC1 86.23
## cg23432430 64.21
## cg09727210 59.83
## cg07158503 57.69
## cg00962106 54.89
## cg06697310 51.23
## cg02225060 50.18
## cg09015880 47.62
## cg16338321 46.05
## cg26081710 43.78
## cg00819121 43.52
## cg00415024 42.70
## cg05064044 41.42
## cg01910713 41.39
## cg10701746 41.20
## cg27272246 40.77
## cg06277607 40.45
## cg02887598 40.15
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 1.8493115613
## 2 1.8358013771
## 3 1.5946672436
## 4 1.1876736663
## 5 1.1067238101
## 6 1.0670838066
## 7 1.0153638313
## 8 0.9477193404
## 9 0.9283402490
## 10 0.8809056124
## 11 0.8518929277
## 12 0.8098635645
## 13 0.8052137932
## 14 0.7899369844
## 15 0.7662250778
## 16 0.7658172477
## 17 0.7621766472
## 18 0.7543094710
## 19 0.7483591793
## 20 0.7428418527
## 21 0.7370550001
## 22 0.7345879704
## 23 0.7291891992
## 24 0.7237499559
## 25 0.7234664891
## 26 0.7176835313
## 27 0.7103256136
## 28 0.6933711824
## 29 0.6910379390
## 30 0.6909253276
## 31 0.6889075545
## 32 0.6778879684
## 33 0.6694058886
## 34 0.6495919138
## 35 0.6472677787
## 36 0.6466759647
## 37 0.6436230870
## 38 0.6419744634
## 39 0.6416811412
## 40 0.6401104316
## 41 0.6382530670
## 42 0.6375973534
## 43 0.6303285637
## 44 0.6298241867
## 45 0.6104356490
## 46 0.6004971240
## 47 0.6001497818
## 48 0.5884350622
## 49 0.5877142067
## 50 0.5866597569
## 51 0.5858906733
## 52 0.5853965581
## 53 0.5769541631
## 54 0.5766865973
## 55 0.5765756344
## 56 0.5713605652
## 57 0.5662492758
## 58 0.5649821706
## 59 0.5639555642
## 60 0.5563301612
## 61 0.5554272925
## 62 0.5496899748
## 63 0.5454664951
## 64 0.5448004913
## 65 0.5446274489
## 66 0.5387270092
## 67 0.5363102173
## 68 0.5362314921
## 69 0.5351129224
## 70 0.5325256170
## 71 0.5300768193
## 72 0.5291223498
## 73 0.5282280096
## 74 0.5271573438
## 75 0.5257567326
## 76 0.5229583408
## 77 0.5191119635
## 78 0.5154802945
## 79 0.5154027133
## 80 0.5144019458
## 81 0.5102932226
## 82 0.5090123941
## 83 0.5088852586
## 84 0.5065934678
## 85 0.5037571785
## 86 0.5023174669
## 87 0.5017157418
## 88 0.4971967579
## 89 0.4958806900
## 90 0.4946592979
## 91 0.4905155697
## 92 0.4896797836
## 93 0.4853499311
## 94 0.4817303965
## 95 0.4797668895
## 96 0.4790947038
## 97 0.4734622154
## 98 0.4710825245
## 99 0.4696588407
## 100 0.4669147028
## 101 0.4620626133
## 102 0.4610588959
## 103 0.4609080027
## 104 0.4598423587
## 105 0.4589796090
## 106 0.4504446076
## 107 0.4373777102
## 108 0.4370669809
## 109 0.4361943331
## 110 0.4335484239
## 111 0.4333827656
## 112 0.4296113833
## 113 0.4265132297
## 114 0.4258303767
## 115 0.4208723745
## 116 0.4208508164
## 117 0.4187083705
## 118 0.4162067601
## 119 0.4158067962
## 120 0.4151394270
## 121 0.4095282911
## 122 0.4069226179
## 123 0.4004364072
## 124 0.3982299986
## 125 0.3966540029
## 126 0.3949188399
## 127 0.3939919367
## 128 0.3896701395
## 129 0.3851262540
## 130 0.3834717302
## 131 0.3772182892
## 132 0.3762560920
## 133 0.3744889877
## 134 0.3728007636
## 135 0.3716097100
## 136 0.3715629978
## 137 0.3713296252
## 138 0.3702936431
## 139 0.3673695143
## 140 0.3642798962
## 141 0.3627276349
## 142 0.3614046908
## 143 0.3609346302
## 144 0.3580747433
## 145 0.3567424549
## 146 0.3555622616
## 147 0.3531851737
## 148 0.3512146612
## 149 0.3470954054
## 150 0.3463001278
## 151 0.3441222629
## 152 0.3385432894
## 153 0.3364968999
## 154 0.3356213656
## 155 0.3337241209
## 156 0.3327535424
## 157 0.3315367843
## 158 0.3308802498
## 159 0.3305939419
## 160 0.3302764633
## 161 0.3268513775
## 162 0.3192226054
## 163 0.3185161538
## 164 0.3160464282
## 165 0.3149973467
## 166 0.3141318614
## 167 0.3084220374
## 168 0.3077499088
## 169 0.3077282023
## 170 0.3073826389
## 171 0.3067509685
## 172 0.3062632827
## 173 0.3051063316
## 174 0.3040890204
## 175 0.3031317321
## 176 0.3013387831
## 177 0.3012498081
## 178 0.2999740727
## 179 0.2993524471
## 180 0.2966790175
## 181 0.2966382204
## 182 0.2948615357
## 183 0.2896771269
## 184 0.2893892721
## 185 0.2854971309
## 186 0.2837102254
## 187 0.2777232361
## 188 0.2757692073
## 189 0.2724632156
## 190 0.2716497354
## 191 0.2700906132
## 192 0.2694163701
## 193 0.2680488904
## 194 0.2667896485
## 195 0.2666732582
## 196 0.2629907604
## 197 0.2607211185
## 198 0.2588766294
## 199 0.2587502563
## 200 0.2526856587
## 201 0.2522339904
## 202 0.2506223932
## 203 0.2484234101
## 204 0.2469683571
## 205 0.2467512442
## 206 0.2465621628
## 207 0.2455025137
## 208 0.2451478928
## 209 0.2416462134
## 210 0.2408669474
## 211 0.2406027707
## 212 0.2399025198
## 213 0.2390230887
## 214 0.2380426677
## 215 0.2320773903
## 216 0.2296522868
## 217 0.2269593461
## 218 0.2267005655
## 219 0.2264591244
## 220 0.2245633017
## 221 0.2216640101
## 222 0.2213863565
## 223 0.2202808788
## 224 0.2187358387
## 225 0.2183393943
## 226 0.2181917400
## 227 0.2084412253
## 228 0.2069746386
## 229 0.2034632790
## 230 0.2022178798
## 231 0.2000299851
## 232 0.1977926363
## 233 0.1964109856
## 234 0.1913950354
## 235 0.1879510169
## 236 0.1857348268
## 237 0.1816455319
## 238 0.1779820954
## 239 0.1779750370
## 240 0.1769033380
## 241 0.1709653389
## 242 0.1700003097
## 243 0.1682468269
## 244 0.1672010662
## 245 0.1641445535
## 246 0.1632085082
## 247 0.1628849938
## 248 0.1619756944
## 249 0.1619512493
## 250 0.1581173278
## 251 0.1575501892
## 252 0.1524909997
## 253 0.1521293734
## 254 0.1482324149
## 255 0.1448325823
## 256 0.1332072552
## 257 0.1172551906
## 258 0.1133975009
## 259 0.1089556076
## 260 0.1053521773
## 261 0.0950868732
## 262 0.0813087633
## 263 0.0767632044
## 264 0.0552451974
## 265 0.0514541861
## 266 0.0443793791
## 267 0.0407841481
## 268 0.0181078631
## 269 0.0105839280
## 270 0.0027205719
## 271 0.0005666358
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.9315
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_ENM1_AUC<-mean_auc
}
print(FeatEval_Freq_ENM1_AUC)
## Area under the curve: 0.9315
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 454 samples
## 271 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6146276 0.054908838
## 0.3 1 0.6 0.50 100 0.6256166 0.081796805
## 0.3 1 0.6 0.50 150 0.6432723 0.126856863
## 0.3 1 0.6 0.75 50 0.6299145 0.076070615
## 0.3 1 0.6 0.75 100 0.6321368 0.091184457
## 0.3 1 0.6 0.75 150 0.6629792 0.174070786
## 0.3 1 0.6 1.00 50 0.6013187 -0.041548935
## 0.3 1 0.6 1.00 100 0.6254945 0.045271510
## 0.3 1 0.6 1.00 150 0.6254701 0.058088855
## 0.3 1 0.8 0.50 50 0.6410745 0.110650208
## 0.3 1 0.8 0.50 100 0.6696215 0.206447934
## 0.3 1 0.8 0.50 150 0.6675214 0.203828227
## 0.3 1 0.8 0.75 50 0.6014164 -0.018674793
## 0.3 1 0.8 0.75 100 0.6146764 0.066136854
## 0.3 1 0.8 0.75 150 0.6454701 0.138914454
## 0.3 1 0.8 1.00 50 0.5991209 -0.042358922
## 0.3 1 0.8 1.00 100 0.6166545 0.029709360
## 0.3 1 0.8 1.00 150 0.6211966 0.059065952
## 0.3 2 0.6 0.50 50 0.6917460 0.226621807
## 0.3 2 0.6 0.50 100 0.6916972 0.236360996
## 0.3 2 0.6 0.50 150 0.6938950 0.256663466
## 0.3 2 0.6 0.75 50 0.6321123 0.088794883
## 0.3 2 0.6 0.75 100 0.6629792 0.148837573
## 0.3 2 0.6 0.75 150 0.6850305 0.193276544
## 0.3 2 0.6 1.00 50 0.6476679 0.092450308
## 0.3 2 0.6 1.00 100 0.6344567 0.063108930
## 0.3 2 0.6 1.00 150 0.6431990 0.091487310
## 0.3 2 0.8 0.50 50 0.6475946 0.144508904
## 0.3 2 0.8 0.50 100 0.6563370 0.164804668
## 0.3 2 0.8 0.50 150 0.6695726 0.194322990
## 0.3 2 0.8 0.75 50 0.6410012 0.102926406
## 0.3 2 0.8 0.75 100 0.6674969 0.167633971
## 0.3 2 0.8 0.75 150 0.6675214 0.170698970
## 0.3 2 0.8 1.00 50 0.6344078 0.076389473
## 0.3 2 0.8 1.00 100 0.6410501 0.083690477
## 0.3 2 0.8 1.00 150 0.6543101 0.120746584
## 0.3 3 0.6 0.50 50 0.6827350 0.200728389
## 0.3 3 0.6 0.50 100 0.6959951 0.239172563
## 0.3 3 0.6 0.50 150 0.7136264 0.289917693
## 0.3 3 0.6 0.75 50 0.6365812 0.094015704
## 0.3 3 0.6 0.75 100 0.6520147 0.134724253
## 0.3 3 0.6 0.75 150 0.6674237 0.174092992
## 0.3 3 0.6 1.00 50 0.6277167 0.039385040
## 0.3 3 0.6 1.00 100 0.6344322 0.053748339
## 0.3 3 0.6 1.00 150 0.6322100 0.047571775
## 0.3 3 0.8 0.50 50 0.6740415 0.190759415
## 0.3 3 0.8 0.50 100 0.6828083 0.205652081
## 0.3 3 0.8 0.50 150 0.6916484 0.229551613
## 0.3 3 0.8 0.75 50 0.6410256 0.088331764
## 0.3 3 0.8 0.75 100 0.6563858 0.132888499
## 0.3 3 0.8 0.75 150 0.6519658 0.126876857
## 0.3 3 0.8 1.00 50 0.6388034 0.071728537
## 0.3 3 0.8 1.00 100 0.6542125 0.115431674
## 0.3 3 0.8 1.00 150 0.6542369 0.103789283
## 0.4 1 0.6 0.50 50 0.6388523 0.136508711
## 0.4 1 0.6 0.50 100 0.6762149 0.235915834
## 0.4 1 0.6 0.50 150 0.7004884 0.297921992
## 0.4 1 0.6 0.75 50 0.5879609 0.017619170
## 0.4 1 0.6 0.75 100 0.6410012 0.137929378
## 0.4 1 0.6 0.75 150 0.6499389 0.156140289
## 0.4 1 0.6 1.00 50 0.5990476 -0.019074926
## 0.4 1 0.6 1.00 100 0.6013431 0.004761008
## 0.4 1 0.6 1.00 150 0.6211722 0.064869505
## 0.4 1 0.8 0.50 50 0.6078632 0.075417232
## 0.4 1 0.8 0.50 100 0.6518926 0.164997842
## 0.4 1 0.8 0.50 150 0.6739194 0.223978821
## 0.4 1 0.8 0.75 50 0.6321856 0.098521482
## 0.4 1 0.8 0.75 100 0.6278144 0.103092505
## 0.4 1 0.8 0.75 150 0.6343834 0.119960199
## 0.4 1 0.8 1.00 50 0.5945543 -0.051606636
## 0.4 1 0.8 1.00 100 0.5990476 -0.004820168
## 0.4 1 0.8 1.00 150 0.6167766 0.067564974
## 0.4 2 0.6 0.50 50 0.6212943 0.109003764
## 0.4 2 0.6 0.50 100 0.6366300 0.136130595
## 0.4 2 0.6 0.50 150 0.6652259 0.189714886
## 0.4 2 0.6 0.75 50 0.6585104 0.157922161
## 0.4 2 0.6 0.75 100 0.6740171 0.192270104
## 0.4 2 0.6 0.75 150 0.6806349 0.212798146
## 0.4 2 0.6 1.00 50 0.6299634 0.074903123
## 0.4 2 0.6 1.00 100 0.6387790 0.085814383
## 0.4 2 0.6 1.00 150 0.6476190 0.112969503
## 0.4 2 0.8 0.50 50 0.6277656 0.113518254
## 0.4 2 0.8 0.50 100 0.6696703 0.193457486
## 0.4 2 0.8 0.50 150 0.6807082 0.228893323
## 0.4 2 0.8 0.75 50 0.6476435 0.134335094
## 0.4 2 0.8 0.75 100 0.6475946 0.128323611
## 0.4 2 0.8 0.75 150 0.6652015 0.182324605
## 0.4 2 0.8 1.00 50 0.6475946 0.113857078
## 0.4 2 0.8 1.00 100 0.6586325 0.137368428
## 0.4 2 0.8 1.00 150 0.6608303 0.149771016
## 0.4 3 0.6 0.50 50 0.6784615 0.222287479
## 0.4 3 0.6 0.50 100 0.6915995 0.248303704
## 0.4 3 0.6 0.50 150 0.6894505 0.240392779
## 0.4 3 0.6 0.75 50 0.6365324 0.108994896
## 0.4 3 0.6 0.75 100 0.6365568 0.102537766
## 0.4 3 0.6 0.75 150 0.6520147 0.141759767
## 0.4 3 0.6 1.00 50 0.6388034 0.053838745
## 0.4 3 0.6 1.00 100 0.6542125 0.105825845
## 0.4 3 0.6 1.00 150 0.6520391 0.098852081
## 0.4 3 0.8 0.50 50 0.6563858 0.144810525
## 0.4 3 0.8 0.50 100 0.6585348 0.143394171
## 0.4 3 0.8 0.50 150 0.6783394 0.203188245
## 0.4 3 0.8 0.75 50 0.6344078 0.115372746
## 0.4 3 0.8 0.75 100 0.6431746 0.128434039
## 0.4 3 0.8 0.75 150 0.6476435 0.128580660
## 0.4 3 0.8 1.00 50 0.6540904 0.128964574
## 0.4 3 0.8 1.00 100 0.6452991 0.094187912
## 0.4 3 0.8 1.00 150 0.6540904 0.119240164
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.648166
FeatEval_Freq_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Freq_mean_accuracy_cv_xgb)
## [1] 0.648166
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Freq_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Freq_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Freq_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Freq_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 113 46
## CN 15 20
##
## Accuracy : 0.6856
## 95% CI : (0.6152, 0.7502)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.2490658
##
## Kappa : 0.2097
##
## Mcnemar's Test P-Value : 0.0001225
##
## Sensitivity : 0.8828
## Specificity : 0.3030
## Pos Pred Value : 0.7107
## Neg Pred Value : 0.5714
## Prevalence : 0.6598
## Detection Rate : 0.5825
## Detection Prevalence : 0.8196
## Balanced Accuracy : 0.5929
##
## 'Positive' Class : CI
##
cm_FeatEval_Freq_xgb_Accuracy <-cm_FeatEval_Freq_xgb$overall["Accuracy"]
cm_FeatEval_Freq_xgb_Kappa <-cm_FeatEval_Freq_xgb$overall["Kappa"]
print(cm_FeatEval_Freq_xgb_Accuracy)
## Accuracy
## 0.685567
print(cm_FeatEval_Freq_xgb_Kappa)
## Kappa
## 0.2096968
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 271)
##
## Overall
## cg26983017 100.00
## cg07504457 96.11
## cg23916408 90.50
## cg11187460 82.69
## cg18285382 79.66
## cg11787167 78.36
## cg03749159 75.14
## cg05161773 72.63
## cg21697769 68.71
## cg15633912 67.82
## PC2 66.79
## cg06697310 63.20
## cg11331837 62.96
## cg15600437 62.33
## cg02823329 61.35
## cg05876883 60.52
## cg25436480 57.95
## cg16202259 56.92
## cg07158503 55.87
## cg19301366 54.30
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: cg26983017 2.205161e-02 0.0115536297 0.005253940 2.205161e-02
## 2: cg07504457 2.119472e-02 0.0123073576 0.008756567 2.119472e-02
## 3: cg23916408 1.995766e-02 0.0178778555 0.008756567 1.995766e-02
## 4: cg11187460 1.823413e-02 0.0122026553 0.007005254 1.823413e-02
## 5: cg18285382 1.756551e-02 0.0136433644 0.005253940 1.756551e-02
## ---
## 232: cg04497611 8.810252e-05 0.0004279974 0.001751313 8.810252e-05
## 233: cg21243064 8.531156e-05 0.0004553075 0.001751313 8.531156e-05
## 234: cg17723206 5.402373e-05 0.0006496913 0.001751313 5.402373e-05
## 235: cg20078646 1.750849e-05 0.0003917131 0.001751313 1.750849e-05
## 236: cg01910713 1.473237e-05 0.0004157667 0.001751313 1.473237e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7293
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_xgb_AUC <- mean_auc
}
print(FeatEval_Freq_xgb_AUC)
## Area under the curve: 0.7293
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 454 samples
## 271 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6630037 0.01679579
## 136 0.6718437 0.06462992
## 271 0.6586081 0.04595409
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 136.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6644851
FeatEval_Freq_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Freq_mean_accuracy_cv_rf)
## [1] 0.6644851
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Freq_rf_trainAccuracy<-train_accuracy
print(FeatEval_Freq_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Freq_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Freq_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 128 62
## CN 0 4
##
## Accuracy : 0.6804
## 95% CI : (0.6098, 0.7454)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 0.3
##
## Kappa : 0.0785
##
## Mcnemar's Test P-Value : 9.408e-15
##
## Sensitivity : 1.00000
## Specificity : 0.06061
## Pos Pred Value : 0.67368
## Neg Pred Value : 1.00000
## Prevalence : 0.65979
## Detection Rate : 0.65979
## Detection Prevalence : 0.97938
## Balanced Accuracy : 0.53030
##
## 'Positive' Class : CI
##
cm_FeatEval_Freq_rf_Accuracy<-cm_FeatEval_Freq_rf$overall["Accuracy"]
print(cm_FeatEval_Freq_rf_Accuracy)
## Accuracy
## 0.6804124
cm_FeatEval_Freq_rf_Kappa<-cm_FeatEval_Freq_rf$overall["Kappa"]
print(cm_FeatEval_Freq_rf_Kappa)
## Kappa
## 0.07845541
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 271)
##
## Importance
## cg21697769 100.00
## cg11331837 98.62
## cg03749159 88.25
## cg11133939 86.05
## cg01008088 85.80
## cg03982462 84.41
## cg05234269 82.48
## cg07138269 81.47
## cg00004073 81.43
## cg18857647 80.97
## cg11314779 78.30
## cg02887598 78.13
## cg01910713 77.48
## cg09120722 76.36
## cg09584650 75.14
## cg17268094 74.22
## cg25879395 73.47
## cg04888234 73.21
## cg04768387 72.82
## cg07158503 72.48
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==4||METHOD_FEATURE_FLAG==6){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==3){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
## CI CN
## 1 2.7875875467 2.7875875467
## 2 2.7222191735 2.7222191735
## 3 2.2322067610 2.2322067610
## 4 2.1280199497 2.1280199497
## 5 2.1161400764 2.1161400764
## 6 2.0505829780 2.0505829780
## 7 1.9590061962 1.9590061962
## 8 1.9116267791 1.9116267791
## 9 1.9094896734 1.9094896734
## 10 1.8879546622 1.8879546622
## 11 1.7616249927 1.7616249927
## 12 1.7535804913 1.7535804913
## 13 1.7226756514 1.7226756514
## 14 1.6699584311 1.6699584311
## 15 1.6120484124 1.6120484124
## 16 1.5687887056 1.5687887056
## 17 1.5331154656 1.5331154656
## 18 1.5207745177 1.5207745177
## 19 1.5024244818 1.5024244818
## 20 1.4865795375 1.4865795375
## 21 1.4622730635 1.4622730635
## 22 1.4583231854 1.4583231854
## 23 1.4497536779 1.4497536779
## 24 1.4472665331 1.4472665331
## 25 1.4455641078 1.4455641078
## 26 1.4197021378 1.4197021378
## 27 1.4183708437 1.4183708437
## 28 1.3704330366 1.3704330366
## 29 1.3630114523 1.3630114523
## 30 1.3467348848 1.3467348848
## 31 1.3184220360 1.3184220360
## 32 1.3165446664 1.3165446664
## 33 1.3067956999 1.3067956999
## 34 1.3053695749 1.3053695749
## 35 1.2380382363 1.2380382363
## 36 1.2293425574 1.2293425574
## 37 1.2224020892 1.2224020892
## 38 1.2169307639 1.2169307639
## 39 1.1097280115 1.1097280115
## 40 1.1034993781 1.1034993781
## 41 1.0815521836 1.0815521836
## 42 1.0496202332 1.0496202332
## 43 1.0096525877 1.0096525877
## 44 1.0022162870 1.0022162870
## 45 0.9946322009 0.9946322009
## 46 0.9833587919 0.9833587919
## 47 0.9652707883 0.9652707883
## 48 0.9566642502 0.9566642502
## 49 0.9299639748 0.9299639748
## 50 0.9204012473 0.9204012473
## 51 0.8908010399 0.8908010399
## 52 0.8812282057 0.8812282057
## 53 0.8805020278 0.8805020278
## 54 0.8644525651 0.8644525651
## 55 0.8534893219 0.8534893219
## 56 0.8483108996 0.8483108996
## 57 0.8461219800 0.8461219800
## 58 0.8311482726 0.8311482726
## 59 0.8001771234 0.8001771234
## 60 0.7941748621 0.7941748621
## 61 0.7815781359 0.7815781359
## 62 0.7777573344 0.7777573344
## 63 0.7434743876 0.7434743876
## 64 0.7198603440 0.7198603440
## 65 0.7193974412 0.7193974412
## 66 0.7171485930 0.7171485930
## 67 0.6959333531 0.6959333531
## 68 0.6864352331 0.6864352331
## 69 0.6825454126 0.6825454126
## 70 0.6813962648 0.6813962648
## 71 0.6797513875 0.6797513875
## 72 0.6783283310 0.6783283310
## 73 0.6720741547 0.6720741547
## 74 0.6694068787 0.6694068787
## 75 0.6624920927 0.6624920927
## 76 0.6547077039 0.6547077039
## 77 0.6522383813 0.6522383813
## 78 0.6337729728 0.6337729728
## 79 0.6297529954 0.6297529954
## 80 0.6282990770 0.6282990770
## 81 0.6241608824 0.6241608824
## 82 0.6147973472 0.6147973472
## 83 0.5955885197 0.5955885197
## 84 0.5917514705 0.5917514705
## 85 0.5895207014 0.5895207014
## 86 0.5885567084 0.5885567084
## 87 0.5780191999 0.5780191999
## 88 0.5588196647 0.5588196647
## 89 0.5566613917 0.5566613917
## 90 0.5375559383 0.5375559383
## 91 0.5364199053 0.5364199053
## 92 0.5313564802 0.5313564802
## 93 0.5297984673 0.5297984673
## 94 0.5241652261 0.5241652261
## 95 0.5225682560 0.5225682560
## 96 0.5062385751 0.5062385751
## 97 0.4932552137 0.4932552137
## 98 0.4932507867 0.4932507867
## 99 0.4837075315 0.4837075315
## 100 0.4822657106 0.4822657106
## 101 0.4818489830 0.4818489830
## 102 0.4519333921 0.4519333921
## 103 0.4312164931 0.4312164931
## 104 0.4151714978 0.4151714978
## 105 0.4029872749 0.4029872749
## 106 0.3947882634 0.3947882634
## 107 0.3777553026 0.3777553026
## 108 0.3597576445 0.3597576445
## 109 0.3472566892 0.3472566892
## 110 0.3467891474 0.3467891474
## 111 0.3414835417 0.3414835417
## 112 0.3324760106 0.3324760106
## 113 0.3152987043 0.3152987043
## 114 0.3023018684 0.3023018684
## 115 0.2883911587 0.2883911587
## 116 0.2800937721 0.2800937721
## 117 0.2787556157 0.2787556157
## 118 0.2688167952 0.2688167952
## 119 0.2679314095 0.2679314095
## 120 0.2475683074 0.2475683074
## 121 0.2462103094 0.2462103094
## 122 0.2439460309 0.2439460309
## 123 0.2356991189 0.2356991189
## 124 0.2187805412 0.2187805412
## 125 0.2173915680 0.2173915680
## 126 0.2092443171 0.2092443171
## 127 0.2084082463 0.2084082463
## 128 0.2029529373 0.2029529373
## 129 0.2003798803 0.2003798803
## 130 0.1900588742 0.1900588742
## 131 0.1862078969 0.1862078969
## 132 0.1654443717 0.1654443717
## 133 0.1617037263 0.1617037263
## 134 0.1531806048 0.1531806048
## 135 0.1423141757 0.1423141757
## 136 0.1363187452 0.1363187452
## 137 0.1232062894 0.1232062894
## 138 0.1104412248 0.1104412248
## 139 0.0884937539 0.0884937539
## 140 0.0839741379 0.0839741379
## 141 0.0804419256 0.0804419256
## 142 0.0785402700 0.0785402700
## 143 0.0726191100 0.0726191100
## 144 0.0645239661 0.0645239661
## 145 0.0506061632 0.0506061632
## 146 0.0458256131 0.0458256131
## 147 0.0295500822 0.0295500822
## 148 0.0074691077 0.0074691077
## 149 -0.0007011184 -0.0007011184
## 150 -0.0064416505 -0.0064416505
## 151 -0.0155381997 -0.0155381997
## 152 -0.0224435473 -0.0224435473
## 153 -0.0230627597 -0.0230627597
## 154 -0.0272048655 -0.0272048655
## 155 -0.0333093091 -0.0333093091
## 156 -0.0478401951 -0.0478401951
## 157 -0.0497759308 -0.0497759308
## 158 -0.0543289821 -0.0543289821
## 159 -0.0562290841 -0.0562290841
## 160 -0.0634126743 -0.0634126743
## 161 -0.0652337962 -0.0652337962
## 162 -0.0900412182 -0.0900412182
## 163 -0.1013843202 -0.1013843202
## 164 -0.1024504213 -0.1024504213
## 165 -0.1124549912 -0.1124549912
## 166 -0.1148361355 -0.1148361355
## 167 -0.1193318119 -0.1193318119
## 168 -0.1236664515 -0.1236664515
## 169 -0.1294409066 -0.1294409066
## 170 -0.1340565917 -0.1340565917
## 171 -0.1406496500 -0.1406496500
## 172 -0.1415512436 -0.1415512436
## 173 -0.1482721385 -0.1482721385
## 174 -0.1568179428 -0.1568179428
## 175 -0.1651867185 -0.1651867185
## 176 -0.1755212656 -0.1755212656
## 177 -0.1770466111 -0.1770466111
## 178 -0.1814507642 -0.1814507642
## 179 -0.1993313467 -0.1993313467
## 180 -0.2046496300 -0.2046496300
## 181 -0.2065001318 -0.2065001318
## 182 -0.2101802223 -0.2101802223
## 183 -0.2206178408 -0.2206178408
## 184 -0.2220606542 -0.2220606542
## 185 -0.2274809064 -0.2274809064
## 186 -0.2369917638 -0.2369917638
## 187 -0.2435812004 -0.2435812004
## 188 -0.2489134617 -0.2489134617
## 189 -0.2515380234 -0.2515380234
## 190 -0.2698387881 -0.2698387881
## 191 -0.2798640474 -0.2798640474
## 192 -0.2875745946 -0.2875745946
## 193 -0.2999117038 -0.2999117038
## 194 -0.3188356581 -0.3188356581
## 195 -0.3202366014 -0.3202366014
## 196 -0.3332083570 -0.3332083570
## 197 -0.3553729528 -0.3553729528
## 198 -0.3608986536 -0.3608986536
## 199 -0.3736483805 -0.3736483805
## 200 -0.3833792398 -0.3833792398
## 201 -0.3836541054 -0.3836541054
## 202 -0.3906954770 -0.3906954770
## 203 -0.3952586258 -0.3952586258
## 204 -0.4005372341 -0.4005372341
## 205 -0.4191896315 -0.4191896315
## 206 -0.4200863936 -0.4200863936
## 207 -0.4257586064 -0.4257586064
## 208 -0.4258943439 -0.4258943439
## 209 -0.4271015000 -0.4271015000
## 210 -0.4454602463 -0.4454602463
## 211 -0.4637763052 -0.4637763052
## 212 -0.4846128405 -0.4846128405
## 213 -0.4969050331 -0.4969050331
## 214 -0.5234514600 -0.5234514600
## 215 -0.5400280095 -0.5400280095
## 216 -0.5438365588 -0.5438365588
## 217 -0.5587734786 -0.5587734786
## 218 -0.5712055552 -0.5712055552
## 219 -0.5812270707 -0.5812270707
## 220 -0.6136262629 -0.6136262629
## 221 -0.6137377152 -0.6137377152
## 222 -0.6284929719 -0.6284929719
## 223 -0.6298347214 -0.6298347214
## 224 -0.6459461188 -0.6459461188
## 225 -0.6537186930 -0.6537186930
## 226 -0.6734727658 -0.6734727658
## 227 -0.6743991521 -0.6743991521
## 228 -0.6968637701 -0.6968637701
## 229 -0.6997265346 -0.6997265346
## 230 -0.7379594719 -0.7379594719
## 231 -0.7621201667 -0.7621201667
## 232 -0.7643717776 -0.7643717776
## 233 -0.7672932463 -0.7672932463
## 234 -0.7968403219 -0.7968403219
## 235 -0.8254412128 -0.8254412128
## 236 -0.8726435018 -0.8726435018
## 237 -0.8910936779 -0.8910936779
## 238 -0.9185024303 -0.9185024303
## 239 -0.9349863320 -0.9349863320
## 240 -0.9404324166 -0.9404324166
## 241 -0.9415220697 -0.9415220697
## 242 -0.9427086057 -0.9427086057
## 243 -0.9505149105 -0.9505149105
## 244 -0.9703707229 -0.9703707229
## 245 -0.9858251373 -0.9858251373
## 246 -1.0003229474 -1.0003229474
## 247 -1.0031500942 -1.0031500942
## 248 -1.0093379956 -1.0093379956
## 249 -1.0111287731 -1.0111287731
## 250 -1.0171877747 -1.0171877747
## 251 -1.0330678235 -1.0330678235
## 252 -1.0936076931 -1.0936076931
## 253 -1.1072980280 -1.1072980280
## 254 -1.1502807625 -1.1502807625
## 255 -1.2362346877 -1.2362346877
## 256 -1.2535737261 -1.2535737261
## 257 -1.2610492662 -1.2610492662
## 258 -1.2614545995 -1.2614545995
## 259 -1.3242804771 -1.3242804771
## 260 -1.3436752336 -1.3436752336
## 261 -1.3973838134 -1.3973838134
## 262 -1.4170959743 -1.4170959743
## 263 -1.4457616393 -1.4457616393
## 264 -1.5439461410 -1.5439461410
## 265 -1.5645077672 -1.5645077672
## 266 -1.5789871207 -1.5789871207
## 267 -1.6552044384 -1.6552044384
## 268 -1.6710730636 -1.6710730636
## 269 -1.6835838774 -1.6835838774
## 270 -1.7165556986 -1.7165556986
## 271 -1.9405152899 -1.9405152899
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
## Area under the curve: 0.7182
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_rf_AUC<-mean_auc
}
print(FeatEval_Freq_rf_AUC)
## Area under the curve: 0.7182
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 454 samples
## 271 predictors
## 2 classes: 'CI', 'CN'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 363, 363, 363, 363, 364
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8061538 0.5878466
## 0.50 0.8039316 0.5836392
## 1.00 0.8083272 0.5818052
##
## Tuning parameter 'sigma' was held constant at a value of 0.001883387
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.001883387 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.001883387 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8061376
FeatEval_Freq_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Freq_mean_accuracy_cv_svm)
## [1] 0.8061376
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.984581497797357"
FeatEval_Freq_svm_trainAccuracy <- train_accuracy
print(FeatEval_Freq_svm_trainAccuracy)
## [1] 0.9845815
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Freq_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Freq_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CI CN
## CI 113 5
## CN 15 61
##
## Accuracy : 0.8969
## 95% CI : (0.8453, 0.9359)
## No Information Rate : 0.6598
## P-Value [Acc > NIR] : 1.772e-14
##
## Kappa : 0.7785
##
## Mcnemar's Test P-Value : 0.04417
##
## Sensitivity : 0.8828
## Specificity : 0.9242
## Pos Pred Value : 0.9576
## Neg Pred Value : 0.8026
## Prevalence : 0.6598
## Detection Rate : 0.5825
## Detection Prevalence : 0.6082
## Balanced Accuracy : 0.9035
##
## 'Positive' Class : CI
##
cm_FeatEval_Freq_svm_Accuracy <- cm_FeatEval_Freq_svm$overall["Accuracy"]
cm_FeatEval_Freq_svm_Kappa <- cm_FeatEval_Freq_svm$overall["Kappa"]
print(cm_FeatEval_Freq_svm_Accuracy)
## Accuracy
## 0.8969072
print(cm_FeatEval_Freq_svm_Kappa)
## Kappa
## 0.7784882
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 648 rows and 272 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg22071943 1.1555556 1.222222 1.251852 0.05092593
## 2 cg09785377 1.0962963 1.222222 1.251852 0.05092593
## 3 cg09015880 0.9925926 1.185185 1.281481 0.04938272
## 4 cg21697769 1.1185185 1.185185 1.222222 0.04938272
## 5 cg02078724 1.1185185 1.185185 1.222222 0.04938272
## 6 cg25758034 1.1259259 1.185185 1.288889 0.04938272
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls < cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "CI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "CI"] in 66 controls (test_data_SVM1$DX CN) < 128 cases (test_data_SVM1$DX CI).
## Area under the curve: 0.9632
## [1] "The auc vlue is:"
## Area under the curve: 0.9632
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_svm_AUC<-mean_auc
}
print(FeatEval_Freq_svm_AUC)
## Area under the curve: 0.9632
In the INPUT Session, “Metrics_Table_Output_FLAG” : This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics
Feature_and_model_Metrics <- c("Training Accuracy", "Test Accuracy", "Test Kappa", "AUC", "Average Test Accuracy during Cross Validation")
ModelTrain_stage_Logistic_metrics_ModelTrainStage <- c(modelTrain_LRM1_trainAccuracy, cm_modelTrain_LRM1_Accuracy, cm_modelTrain_LRM1_Kappa,modelTrain_LRM1_AUC, modelTrain_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Mean<-c(FeatEval_Mean_LRM1_trainAccuracy,
cm_FeatEval_Mean_LRM1_Accuracy,cm_FeatEval_Mean_LRM1_Kappa,FeatEval_Mean_LRM1_AUC, FeatEval_Mean_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Median<-c(FeatEval_Median_LRM1_trainAccuracy,
cm_FeatEval_Median_LRM1_Accuracy,cm_FeatEval_Median_LRM1_Kappa,FeatEval_Median_LRM1_AUC, FeatEval_Median_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Freq<-c(FeatEval_Freq_LRM1_trainAccuracy,
cm_FeatEval_Freq_LRM1_Accuracy,cm_FeatEval_Freq_LRM1_Kappa,FeatEval_Freq_LRM1_AUC,FeatEval_Freq_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics<-c(ModelTrain_stage_Logistic_metrics_ModelTrainStage, ModelTrain_stage_Logistic_metrics_Feature_Mean,ModelTrain_stage_Logistic_metrics_Feature_Median,ModelTrain_stage_Logistic_metrics_Feature_Freq)
ModelTrain_stage_ElasticNet_metrics_ModelTrainStage <- c(modelTrain_ENM1_trainAccuracy, cm_modelTrain_ENM1_Accuracy, cm_modelTrain_ENM1_Kappa,modelTrain_ENM1_AUC, modelTrain_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Mean<-c(FeatEval_Mean_ENM1_trainAccuracy,
cm_FeatEval_Mean_ENM1_Accuracy,cm_FeatEval_Mean_ENM1_Kappa,FeatEval_Mean_ENM1_AUC, FeatEval_Mean_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Median<-c(FeatEval_Median_ENM1_trainAccuracy,
cm_FeatEval_Median_ENM1_Accuracy,cm_FeatEval_Median_ENM1_Kappa,FeatEval_Median_ENM1_AUC, FeatEval_Median_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Freq<-c(FeatEval_Freq_ENM1_trainAccuracy,
cm_FeatEval_Freq_ENM1_Accuracy,cm_FeatEval_Freq_ENM1_Kappa,FeatEval_Freq_ENM1_AUC,FeatEval_Freq_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics<-c(ModelTrain_stage_ElasticNet_metrics_ModelTrainStage, ModelTrain_stage_ElasticNet_metrics_Feature_Mean,ModelTrain_stage_ElasticNet_metrics_Feature_Median,ModelTrain_stage_ElasticNet_metrics_Feature_Freq)
ModelTrain_stage_XGBoost_metrics_ModelTrainStage <- c(modelTrain_xgb_trainAccuracy, cm_modelTrain_xgb_Accuracy, cm_modelTrain_xgb_Kappa,modelTrain_xgb_AUC, modelTrain_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Mean<-c(FeatEval_Mean_xgb_trainAccuracy,
cm_FeatEval_Mean_xgb_Accuracy,cm_FeatEval_Mean_xgb_Kappa,FeatEval_Mean_xgb_AUC, FeatEval_Mean_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Median<-c(FeatEval_Median_xgb_trainAccuracy,
cm_FeatEval_Median_xgb_Accuracy,cm_FeatEval_Median_xgb_Kappa,FeatEval_Median_xgb_AUC, FeatEval_Median_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Freq<-c(FeatEval_Freq_xgb_trainAccuracy,
cm_FeatEval_Freq_xgb_Accuracy,cm_FeatEval_Freq_xgb_Kappa,FeatEval_Freq_xgb_AUC,FeatEval_Freq_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics<-c(ModelTrain_stage_XGBoost_metrics_ModelTrainStage, ModelTrain_stage_XGBoost_metrics_Feature_Mean,ModelTrain_stage_XGBoost_metrics_Feature_Median,ModelTrain_stage_XGBoost_metrics_Feature_Freq)
ModelTrain_stage_RandomForest_metrics_ModelTrainStage <- c(modelTrain_rf_trainAccuracy, cm_modelTrain_rf_Accuracy, cm_modelTrain_rf_Kappa,modelTrain_rf_AUC, modelTrain_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Mean<-c(FeatEval_Mean_rf_trainAccuracy,
cm_FeatEval_Mean_rf_Accuracy,cm_FeatEval_Mean_rf_Kappa,FeatEval_Mean_rf_AUC, FeatEval_Mean_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Median<-c(FeatEval_Median_rf_trainAccuracy,
cm_FeatEval_Median_rf_Accuracy,cm_FeatEval_Median_rf_Kappa,FeatEval_Median_rf_AUC, FeatEval_Median_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Freq<-c(FeatEval_Freq_rf_trainAccuracy,
cm_FeatEval_Freq_rf_Accuracy,cm_FeatEval_Freq_rf_Kappa,FeatEval_Freq_rf_AUC,FeatEval_Freq_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics<-c(ModelTrain_stage_RandomForest_metrics_ModelTrainStage, ModelTrain_stage_RandomForest_metrics_Feature_Mean,ModelTrain_stage_RandomForest_metrics_Feature_Median,ModelTrain_stage_RandomForest_metrics_Feature_Freq)
ModelTrain_stage_SVM_metrics_ModelTrainStage <- c(modelTrain_svm_trainAccuracy, cm_modelTrain_svm_Accuracy, cm_modelTrain_svm_Kappa,modelTrain_svm_AUC, modelTrain_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Mean<-c(FeatEval_Mean_svm_trainAccuracy,
cm_FeatEval_Mean_svm_Accuracy,cm_FeatEval_Mean_svm_Kappa,FeatEval_Mean_svm_AUC, FeatEval_Mean_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Median<-c(FeatEval_Median_svm_trainAccuracy,
cm_FeatEval_Median_svm_Accuracy,cm_FeatEval_Median_svm_Kappa,FeatEval_Median_svm_AUC, FeatEval_Median_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Freq<-c(FeatEval_Freq_svm_trainAccuracy,
cm_FeatEval_Freq_svm_Accuracy,cm_FeatEval_Freq_svm_Kappa,FeatEval_Freq_svm_AUC,FeatEval_Freq_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics<-c(ModelTrain_stage_SVM_metrics_ModelTrainStage, ModelTrain_stage_SVM_metrics_Feature_Mean,ModelTrain_stage_SVM_metrics_Feature_Median,ModelTrain_stage_SVM_metrics_Feature_Freq)
if(METHOD_FEATURE_FLAG==1){
classifcationType = "Multiclass"
}
if(METHOD_FEATURE_FLAG==2){
classifcationType = "Multiclass and use PCA"
}
if(METHOD_FEATURE_FLAG==3){
classifcationType = "Binary"
}
if(METHOD_FEATURE_FLAG==4){
classifcationType = "CN vs Dementia (AD)"
}
if(METHOD_FEATURE_FLAG==5){
classifcationType = "CN vs MCI"
}
if(METHOD_FEATURE_FLAG==6){
classifcationType = "MCI vs Dementia"
}
Metrics_results_df <- data.frame()
library(dplyr)
Metrics_results_df <- data.frame(
`Number_of_CpG_used` = rep(Number_N_TopNCpGs, 20),
`Number_of_Phenotype_Features_Used` = rep(5, 20),
`Total_Number_of_features_before_Preprocessing` = rep(Number_N_TopNCpGs+5, 20),
`Number_of_features_after_processing` = rep(Num_feaForProcess, 20),
`Classification_Type` = rep(classifcationType, 20),
`Number_of_Key_features_Selected_(Mean,Median)` = rep(INPUT_NUMBER_FEATURES, 20),
`Number_of_Key_features_remained_based_on_frequency_methods` = rep(Num_KeyFea_Frequency, 20),
`Metrics_Stage` = c(rep("Model Train Stage",5),rep("Key Feature Evaluation (Select based on Mean) ",5),rep("Key Feature Evaluation (Select based on Median) ",5),rep("Key Feature Evaluation (Select based on Frequency) ",5)),
`Metric` = rep(Feature_and_model_Metrics, 4),
`Logistic_regression` = c(ModelTrain_stage_Logistic_metrics),
`Elastic_Net` = c(ModelTrain_stage_ElasticNet_metrics),
`XGBoost` = c(ModelTrain_stage_XGBoost_metrics),
`Random_Forest` = c(ModelTrain_stage_RandomForest_metrics),
`SVM` = c(ModelTrain_stage_SVM_metrics)
)
print(Metrics_results_df)
## Number_of_CpG_used Number_of_Phenotype_Features_Used Total_Number_of_features_before_Preprocessing Number_of_features_after_processing Classification_Type
## 1 5000 5 5005 313 Binary
## 2 5000 5 5005 313 Binary
## 3 5000 5 5005 313 Binary
## 4 5000 5 5005 313 Binary
## 5 5000 5 5005 313 Binary
## 6 5000 5 5005 313 Binary
## 7 5000 5 5005 313 Binary
## 8 5000 5 5005 313 Binary
## 9 5000 5 5005 313 Binary
## 10 5000 5 5005 313 Binary
## 11 5000 5 5005 313 Binary
## 12 5000 5 5005 313 Binary
## 13 5000 5 5005 313 Binary
## 14 5000 5 5005 313 Binary
## 15 5000 5 5005 313 Binary
## 16 5000 5 5005 313 Binary
## 17 5000 5 5005 313 Binary
## 18 5000 5 5005 313 Binary
## 19 5000 5 5005 313 Binary
## 20 5000 5 5005 313 Binary
## Number_of_Key_features_Selected_.Mean.Median. Number_of_Key_features_remained_based_on_frequency_methods Metrics_Stage
## 1 250 271 Model Train Stage
## 2 250 271 Model Train Stage
## 3 250 271 Model Train Stage
## 4 250 271 Model Train Stage
## 5 250 271 Model Train Stage
## 6 250 271 Key Feature Evaluation (Select based on Mean)
## 7 250 271 Key Feature Evaluation (Select based on Mean)
## 8 250 271 Key Feature Evaluation (Select based on Mean)
## 9 250 271 Key Feature Evaluation (Select based on Mean)
## 10 250 271 Key Feature Evaluation (Select based on Mean)
## 11 250 271 Key Feature Evaluation (Select based on Median)
## 12 250 271 Key Feature Evaluation (Select based on Median)
## 13 250 271 Key Feature Evaluation (Select based on Median)
## 14 250 271 Key Feature Evaluation (Select based on Median)
## 15 250 271 Key Feature Evaluation (Select based on Median)
## 16 250 271 Key Feature Evaluation (Select based on Frequency)
## 17 250 271 Key Feature Evaluation (Select based on Frequency)
## 18 250 271 Key Feature Evaluation (Select based on Frequency)
## 19 250 271 Key Feature Evaluation (Select based on Frequency)
## 20 250 271 Key Feature Evaluation (Select based on Frequency)
## Metric Logistic_regression Elastic_Net XGBoost Random_Forest SVM
## 1 Training Accuracy 0.9977974 0.9735683 1.0000000 1.0000000000 0.9911894
## 2 Test Accuracy 0.8762887 0.8969072 0.7525773 0.6701030928 0.8659794
## 3 Test Kappa 0.7095084 0.7560362 0.3755365 0.0487281643 0.7057863
## 4 AUC 0.9148911 0.9457860 0.7602983 0.6900449811 0.9156013
## 5 Average Test Accuracy during Cross Validation 0.7295157 0.7226380 0.6381592 0.6637606838 0.8296215
## 6 Training Accuracy 0.9977974 0.9977974 1.0000000 1.0000000000 0.9801762
## 7 Test Accuracy 0.8556701 0.9020619 0.6907216 0.6701030928 0.8505155
## 8 Test Kappa 0.6636949 0.7709136 0.2321900 0.0664661654 0.6730210
## 9 AUC 0.8995028 0.9157197 0.7398201 0.7015861742 0.9421165
## 10 Average Test Accuracy during Cross Validation 0.7483299 0.7279298 0.6529530 0.6659584860 0.8259829
## 11 Training Accuracy 0.9977974 0.9977974 1.0000000 1.0000000000 0.9845815
## 12 Test Accuracy 0.8711340 0.8917526 0.6701031 0.6546391753 0.8195876
## 13 Test Kappa 0.7031460 0.7487357 0.1810026 -0.0006158584 0.6109774
## 14 AUC 0.9092093 0.9243608 0.7262074 0.7142518939 0.9070786
## 15 Average Test Accuracy during Cross Validation 0.7681699 0.7355177 0.6460783 0.6622629223 0.8457957
## 16 Training Accuracy 1.0000000 0.9977974 1.0000000 1.0000000000 0.9845815
## 17 Test Accuracy 0.8608247 0.8917526 0.6855670 0.6804123711 0.8969072
## 18 Test Kappa 0.6818127 0.7467993 0.2096968 0.0784554091 0.7784882
## 19 AUC 0.9074337 0.9314631 0.7292850 0.7181581439 0.9631866
## 20 Average Test Accuracy during Cross Validation 0.7461349 0.7288590 0.6481660 0.6644851445 0.8061376
Write out the data frame (Model Metrics) to csv file if FLAG_WRITE_METRICS_DF = TRUE
if(FLAG_WRITE_METRICS_DF){
write.csv(Metrics_results_df,OUTUT_PerformanceMetricsCSV_PATHNAME,row.names = FALSE)
print("Metrics Performance output path:")
print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## [1] "Metrics Performance output path:"
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method3_BinaryClass_CN_vs_CI\\Method3_BinaryClass_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"
Phenotype Part Data frame : “phenoticPart_RAW”
RAW Merged Data frame : “merged_df_raw”
Processed Data, i.e data used for model train.
name for “processed_data” could be :
“processed_data_m1”, which uses method one to process the data
“processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.
“processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names, and will assigned to “processed_dataFrame”.
“processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
“processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
“processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
name for “AfterProcess_FeatureName” could be :
Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”
Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”
Feature Frequency / Common Data Frame:
“frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “INPUT_NUMBER_FEATURES”
“feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
Output data frame with selected features based on mean method: “df_selected_Mean”
, This data frame not have column named “SampleID”.
Output data frame with selected features based on median method: “df_selected_Median”, This data frame not have column named “SampleID”.
Output data frame with selected features based on frequency / common feature method: “df_process_Output_freq”, This data frame not have column named “SampleID”.
And the Feature names: “df_process_frequency_FeatureName”
“df_feature_Output_frequency” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “NUM_COMMON_FEATURES_SET_Frequency”
“Selected_Frequency_Feature_importance” This is importance value of selected features’ frequency ordered by Total count of frequency
“feature_output_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_Output_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
Number of CpG used: “Number_N_TopNCpGs”
Phenotype features selected:
Number of features before processing: (#Phenotype features selected) + (#CpGs Used)
Number of features after processing (DMP, data cleaning):“Num_feaForProcess”
Model performance (Variable names)- Model Training Stage:
| Initial Model Training Metric | Logistic regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | modelTrain_LRM1_trainAccuracy | modelTrain_ENM1_trainAccuracy | modelTrain_xgb_trainAccuracy | modelTrain_rf_trainAccuracy | modelTrain_svm_trainAccuracy |
| Test Accuracy | cm_modelTrain_LRM1_Accuracy | cm_modelTrain_ENM1_Accuracy | cm_modelTrain_xgb_Accuracy | cm_modelTrain_rf_Accuracy | cm_modelTrain_svm_Accuracy |
| Test Kappa | cm_modelTrain_LRM1_Kappa | cm_modelTrain_ENM1_Kappa | cm_modelTrain_xgb_Kappa | cm_modelTrain_rf_Kappa | cm_modelTrain_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | modelTrain_LRM1_AUC | modelTrain_ENM1_AUC | modelTrain_xgb_AUC | modelTrain_rf_AUC | modelTrain_svm_AUC |
| Average Test Accuracy during Cross Validation | modelTrain_mean_accuracy_cv_LRM1 | modelTrain_mean_accuracy_cv_ENM1 | modelTrain_mean_accuracy_cv_xgb | modelTrain_mean_accuracy_cv_rf | modelTrain_mean_accuracy_cv_svm |
Number of Key features selected (Mean/Median Methods) : “INPUT_NUMBER_FEATURES”
Number of Key features remained based on frequency methods :
“Num_KeyFea_Frequency”
Performance of the set of key features (Selected under 3 methods):
Based on Mean:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Mean_LRM1_trainAccuracy | FeatEval_Mean_ENM1_trainAccuracy | FeatEval_Mean_xgb_trainAccuracy | FeatEval_Mean_rf_trainAccuracy | FeatEval_Mean_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Mean_LRM1_Accuracy | cm_FeatEval_Mean_ENM1_Accuracy | cm_FeatEval_Mean_xgb_Accuracy | cm_FeatEval_Mean_rf_Accuracy | cm_FeatEval_Mean_svm_Accuracy |
| Test Kappa | cm_FeatEval_Mean_LRM1_Kappa | cm_FeatEval_Mean_ENM1_Kappa | cm_FeatEval_Mean_xgb_Kappa | cm_FeatEval_Mean_rf_Kappa | cm_FeatEval_Mean_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Mean_LRM1_AUC | FeatEval_Mean_ENM1_AUC | FeatEval_Mean_xgb_AUC | FeatEval_Mean_rf_AUC | FeatEval_Mean_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Mean_mean_accuracy_cv_LRM1 | FeatEval_Mean_mean_accuracy_cv_ENM1 | FeatEval_Mean_mean_accuracy_cv_xgb | FeatEval_Mean_mean_accuracy_cv_rf | FeatEval_Mean_mean_accuracy_cv_svm |
Based on Median:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Median_LRM1_trainAccuracy | FeatEval_Median_ENM1_trainAccuracy | FeatEval_Median_xgb_trainAccuracy | FeatEval_Median_rf_trainAccuracy | FeatEval_Median_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Median_LRM1_Accuracy | cm_FeatEval_Median_ENM1_Accuracy | cm_FeatEval_Median_xgb_Accuracy | cm_FeatEval_Median_rf_Accuracy | cm_FeatEval_Median_svm_Accuracy |
| Test Kappa | cm_FeatEval_Median_LRM1_Kappa | cm_FeatEval_Median_ENM1_Kappa | cm_FeatEval_Median_xgb_Kappa | cm_FeatEval_Median_rf_Kappa | cm_FeatEval_Median_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Median_LRM1_AUC | FeatEval_Median_ENM1_AUC | FeatEval_Median_xgb_AUC | FeatEval_Median_rf_AUC | FeatEval_Median_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Median_mean_accuracy_cv_LRM1 | FeatEval_Median_mean_accuracy_cv_ENM1 | FeatEval_Median_mean_accuracy_cv_xgb | FeatEval_Median_mean_accuracy_cv_rf | FeatEval_Median_mean_accuracy_cv_svm |
Based on Frequency:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Freq_LRM1_trainAccuracy | FeatEval_Freq_ENM1_trainAccuracy | FeatEval_Freq_xgb_trainAccuracy | FeatEval_Freq_rf_trainAccuracy | FeatEval_Freq_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Freq_LRM1_Accuracy | cm_FeatEval_Freq_ENM1_Accuracy | cm_FeatEval_Freq_xgb_Accuracy | cm_FeatEval_Freq_rf_Accuracy | cm_FeatEval_Freq_svm_Accuracy |
| Test Kappa | cm_FeatEval_Freq_LRM1_Kappa | cm_FeatEval_Freq_ENM1_Kappa | cm_FeatEval_Freq_xgb_Kappa | cm_FeatEval_Freq_rf_Kappa | cm_FeatEval_Freq_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Freq_LRM1_AUC | FeatEval_Freq_ENM1_AUC | FeatEval_Freq_xgb_AUC | FeatEval_Freq_rf_AUC | FeatEval_Freq_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Freq_mean_accuracy_cv_LRM1 | FeatEval_Freq_mean_accuracy_cv_ENM1 | FeatEval_Freq_mean_accuracy_cv_xgb | FeatEval_Freq_mean_accuracy_cv_rf | FeatEval_Freq_mean_accuracy_cv_svm |